*Original Paper: Lipman, Y., Chen, R. T. Q., Ben-Hamu, H., Nickel, M., & Le, M. (2023). Flow Matching for Generative Modeling (arXiv:2210.02747). arXiv. DOI
Let denote the position of a -dimensional point of a fluid in . The time evolution is a bundle to time trajectories consisting of the “flow” of with regard to time, referred to as the flow (or current). The flow (or current) follows a velocity field written as where . Specifically, given a velocity field we can completely characterize given we know it’s initial position . Indeed, it is clear that the first order ODE
admits a unique solution.
The key insight of flow matching is to generate a velocity that moves a known distribution into the distribution of the population such that at , approximates the target distribution.
That is given samples from (which is pure noise) and samples from (derived through observation) we want to find such s.t. the distribution is exactly the distribution of where . Note that this distribution of is given by the equation
due to the change of variable formulae. In other words, we have
Equations , and connect the velocity field with the probability distribution . However, using all three equations to find from is cumbersome, especially , which is dimensional. We take an idea from physics (of the probability “current” from quantum mechanics) to simplify into a partial differential equation.
How to Visualize
The author finds that visualizing as the time evolution of where describes the wave function of an electron is helpful. This may help give an intuitive understanding the mechanics of flow optimization. We’re simply aiming to find the velocity of an electron in dimensions.
Warning for Mathematicians
Note the equations below are not mathematically rigorous. Rigorous description of in dimensions requires a thorough understanding of differential forms which is considered beyond the scope of this article. Tn.
Since probability is a conserved quantity globally (i.e. the integral of over is always identically equal to ) and flow is bijective (i.e. is invertible or the fluid is non-compressible) we have the continuity equation i.e. the rate of change in the probability of any region is equal to the probability “current” (or flow) at the surface (all probability outflow occurs at the surface).
Mathematically, this is
By the divergence theorem.
If equation holds for any region we have
In other words given the problem of finding is to solve the PDE . (the inverse holds as well)
How to Implement Flow Matching
Loss Function for Flow Matching
Restating the Problem
Let the distribution of the population be denoted with a sample from the population. Note in the problem of finding , we have no access to directly, only samples from denoted .
The flow matching objective is to find s.t. can be sampled from a known distribution (e.g. a known distribution such as ) while approximates . Let such a target distribution be denoted .
Note that is completely determined by an invertible transformation called “flow” as demonstrated in Eq. . But , as shown in and is merely the unique solution admitted by a first order differential equation determined completely by the velocity field . Let the corresponding velocity field for be written as .
Thus given a target probability density path and it’s corresponding velocity field we want to find a vector field with learnable parameters such that the loss parameter defined by the mean square error (MSE)
In other words, we want to regress to . For sufficiently small , it is clear the probability path generated by will closely approximate . The problem with is how to find and to calculate . It is clear there is more than one possible (that admits the normal distribution at and the target distribution and ). Even more importantly, it is unknown how to find .
The authors aim to show and can be approximated by samples from through an appropriate method of aggregation. The construction is shown in Constructing a Velocity Field.
Given a normal distribution on with small enough , it is clear that the marginal probability
This is indeed the basic premise of any scientific measurement (i.e. the original distribution can be inferred from imperfect measurements which have sufficiently high precision ).
Based on this fact we design a marginal probability path given on observation written as
with and . It is clear that for this will closely approximate the original distribution of the population, denoted as .
Thus the problem becomes determining a time trajectory which connects these two end points in order to determine and thus approximate .
The question is then to ask: What is the velocity field for the marginal probability path?
A natural answer might be exactly the expectation of the velocity field which corresponds to that is
The marginal velocity field described by generates exactly the marginal probability trajectory described by . To verify this, we see if they satisfy the continuity equation:
Intuitively, this follows from the fact that and are but transformations by linear operators. To verify this,
Despite constructing and explicitly through integrals, the problem of minimizing
remains intractable at not only but also and contain complicated integrals. A simpler approach would be a “conditional flow matching”, which has the same optimum as the original . This is given as
To use in the place of , we need to prove that is in fact equal to , i.e. the two have the same gradient with regard to for marginal probability paths and .