*Original Paper: Lipman, Y., Chen, R. T. Q., Ben-Hamu, H., Nickel, M., & Le, M. (2023). Flow Matching for Generative Modeling (arXiv:2210.02747). arXiv. DOI

Let denote the position of a -dimensional point of a fluid in . The time evolution is a bundle to time trajectories consisting of the “flow” of with regard to time, referred to as the flow (or current). The flow (or current) follows a velocity field written as where . Specifically, given a velocity field we can completely characterize given we know it’s initial position . Indeed, it is clear that the first order ODE

admits a unique solution.

The key insight of flow matching is to generate a velocity that moves a known distribution into the distribution of the population such that at , approximates the target distribution.

That is given samples from (which is pure noise) and samples from (derived through observation) we want to find such s.t. the distribution is exactly the distribution of where . Note that this distribution of is given by the equation

due to the change of variable formulae. In other words, we have

Equations , and connect the velocity field with the probability distribution . However, using all three equations to find from is cumbersome, especially , which is dimensional. We take an idea from physics (of the probability “current” from quantum mechanics) to simplify into a partial differential equation.

How to Visualize

The author finds that visualizing as the time evolution of where describes the wave function of an electron is helpful. This may help give an intuitive understanding the mechanics of flow optimization. We’re simply aiming to find the velocity of an electron in dimensions.

Warning for Mathematicians

Note the equations below are not mathematically rigorous. Rigorous description of in dimensions requires a thorough understanding of differential forms which is considered beyond the scope of this article. Tn.

Since probability is a conserved quantity globally (i.e. the integral of over is always identically equal to ) and flow is bijective (i.e. is invertible or the fluid is non-compressible) we have the continuity equation i.e. the rate of change in the probability of any region is equal to the probability “current” (or flow) at the surface (all probability outflow occurs at the surface).

Mathematically, this is

By the divergence theorem.

If equation holds for any region we have

In other words given the problem of finding is to solve the PDE . (the inverse holds as well)

How to Implement Flow Matching

Loss Function for Flow Matching
Restating the Problem

Let the distribution of the population be denoted with a sample from the population. Note in the problem of finding , we have no access to directly, only samples from denoted .

The flow matching objective is to find s.t. can be sampled from a known distribution (e.g. a known distribution such as ) while approximates . Let such a target distribution be denoted .

Note that is completely determined by an invertible transformation called “flow” as demonstrated in Eq. . But , as shown in and is merely the unique solution admitted by a first order differential equation determined completely by the velocity field . Let the corresponding velocity field for be written as .

Thus given a target probability density path and it’s corresponding velocity field we want to find a vector field with learnable parameters such that the loss parameter defined by the mean square error (MSE)

In other words, we want to regress to . For sufficiently small , it is clear the probability path generated by will closely approximate . The problem with is how to find and to calculate . It is clear there is more than one possible (that admits the normal distribution at and the target distribution and ). Even more importantly, it is unknown how to find .

The authors aim to show and can be approximated by samples from through an appropriate method of aggregation. The construction is shown in Constructing a Velocity Field.
Link to original

Constructing a Velocity Field
Given a normal distribution on with small enough , it is clear that the marginal probability

This is indeed the basic premise of any scientific measurement (i.e. the original distribution can be inferred from imperfect measurements which have sufficiently high precision ).

Based on this fact we design a marginal probability path given on observation written as

with and . It is clear that for this will closely approximate the original distribution of the population, denoted as .

Thus the problem becomes determining a time trajectory which connects these two end points in order to determine and thus approximate .

The question is then to ask: What is the velocity field for the marginal probability path?

A natural answer might be exactly the expectation of the velocity field which corresponds to that is

The marginal velocity field described by generates exactly the marginal probability trajectory described by . To verify this, we see if they satisfy the continuity equation:

Intuitively, this follows from the fact that and are but transformations by linear operators. To verify this,

as desired.
Link to original

Simplifying to a Conditional Loss Function
Despite constructing and explicitly through integrals, the problem of minimizing

remains intractable at not only but also and contain complicated integrals. A simpler approach would be a “conditional flow matching”, which has the same optimum as the original . This is given as

To use in the place of , we need to prove that is in fact equal to , i.e. the two have the same gradient with regard to for marginal probability paths and .
Link to original

🧬 Bytes from the Cell

Explorer

Flow Matching

How to Implement Flow Matching

Loss Function for Flow Matching

Restating the Problem

Constructing a Velocity Field

Simplifying to a Conditional Loss Function

Graph View

Backlinks