On the Pythagorean Structure of the Optimal Transport for Separable Cost Functions

In this paper, we study the optimal transport problem induced by separable cost functions. In this framework, transportation can be expressed as the composition of two lower-dimensional movements. Through this reformulation, we prove that the random variable inducing the optimal transportation plan enjoys a conditional independence property. We conclude the paper by focusing on some significant settings. In particular, we study the problem in the Euclidean space endowed with the squared Euclidean distance. In this instance, we retrieve an explicit formula for the optimal transportation plan between any couple of measures as long as one of them is supported on a straight line.


Introduction
The Optimal Transport (OT) problem is a classical minimization problem dating back to the work of Monge [17] and Kantorovich [14,12].In this problem, we are given two probability measures, namely µ and ν, and we search for the cheapest way to reshape µ into ν.The effort needed in order to perform this transformation depends on a cost function, which describes the underlying geometry of the product space of the support of the two measures.In the right setting, this effort induces a distance between probability measures.
During the last century, the OT problem has been fruitfully used in many applied fields such as the study of systems of particles by Dobrushin [9], the Boltzmann equation by Tanaka [23,22,18], and the field of fluidodynamics by Yann Brenier [7].All these results pointed out that, by a qualitative description of optimal transport, it was possible to gain insightful information on many open problems.For this reason, the OT problem has become a topic of major interest for analysts, probabilists and statisticians [24,3,21].In particular, a plethora of results concerning the uniqueness [11,8,10], the structure [1,2,20], and the regularity [16,6] of the optimal transportation plan in the continuous framework has been proved.
In this paper, we specialize the problem to the separable cost functions.A cost function is said to be separable if it is the sum of two independent pieces.On R 2 , this means c : R 2 ˆR2 Ñ R, cpx, yq " c1px1, y1q `c2px2, y2q, where x " px1, x2q and y " py1, y2q.The most famous ground distances meeting this condition are the cost functions induced by l p ´norms, which are cppx, yq " p|x1 ´y1|q p `p|x2 ´y2|q p .
In [4], the authors exploited the geometry induced by this class of cost functions to reformulate the transportation problem between discrete measures as an efficient uncapacitated minimum cost flow problem.In this paper, we delve further into the properties of those flows, which we will be calling cardinal flows, to retrieve significant information on the Wasserstein cost and the structure of optimal transportation plans.Let pf p1q , f p2q q be an optimal cardinal flow between µ and ν for the generic separable cost function c " c1 `c2 and let ζ be the common marginal between f p1q and f p2q , i.e.
In our first and main result, we show that the restriction of f p1q to a horizontal line is an optimal transportation plan between the restriction of µ and ζ on the same line.Similarly, the restriction of f p2q on any vertical line is an optimal transportation plan between the restriction of ζ and ν.By expressing this property through the conditional laws of µ, ζ, and ν, we find where Wcpµ, νq is the Wasserstein cost between µ and ν.We call a measure ζ satisfying the identity (2) a pivot measure.We show that, given an optimal cardinal flow and its pivot measure, it is possible to retrieve an optimal transportation plan.Moreover, among all the possible transportation plans, there exists one whose first and last marginals are independent given the other two.We conclude the paper by analyzing formula (2) in two specifics frameworks.
In the first one the cost function is the sum of two independent distances, i.e. c " d is a separable distance.In this setting, the Wasserstein distance W d inherits a weaker version of the separability, in particular, we have for any pivot measure ζ.
The other case we consider is X " Y " R 2 and c is a cost function as in (1).In this case, knowing the pivot measure is enough to retrieve an optimal transportation plan.This is possible because the optimal transportation plan between one-dimensional measures has an explicit formula.In particular, we can rewrite formula (2) through the pseudo-inverse cumulative functions of the conditional laws of µ, ν, and ζ and find Finally, we show that, when the cost function is the squared Euclidean distance, this formula allows us to express the optimal cardinal flow (and therefore the optimal transportation plan) between a measure supported on a straight line and any other measure through a closed formula.

Preliminaries
In this section, we fix our notation and recall the Optimal Transportation problem.
To keep the discussion as general as possible, we only require X and Y to be Polish spaces.
For a complete discussion on these topics, we refer to [3,5,24].

Basic Notions of Measure Theory
Given a Polish space pX, dq, we denote with PpXq the set of all the Borel probability measures over X.Given µ P PpXq, we denote with L p µ the set of L p integrable functions with respect to the measure µ.Given µ P PpXq and ν P PpY q, we denote with µ b ν the direct product measure of µ and ν.We say that µ P PpXq has finite p´th moment if, given any x0 P X, Finally, we denote with PppXq the set of measures with finite p´th moment.
Definition 1 (Push-forward of measures).Let X and Y be two Polish spaces, T : X Ñ Y a measurable function, and µ P PpXq.The push-forward measure of µ through T is defined as T # µpBq :" µpT ´1pBqq for each Borel set B Ă Y .
Remark 1.Given a measurable function T : X Ñ Y and µ P PpXq, there is an integral equation that characterize the push-forward measure T # µ.Given any measurable function φ on Y the push-forward measure of µ through T is the only probability measure ν satisfying the identity Lemma 1 (Chain Rule for Push-forwards).Let X, Y , and Z be three Polish spaces and let T1 : X Ñ Y , T2 : Y Ñ Z be two measurable functions.Given any µ P PpXq, it holds true the chain rule Let us assume that the Polish space X is the direct product of two Polish spaces X1 and X2.The projections over X1 and X2 are then defined as ppX 1 qpxq :" x1 and ppX 2 qpxq :" x2 respectively, where x " px1, x2q is a generic point of X.Those functions are continuous (and hence measurable).
Similarly, the second marginal of µ is the probability measure µ2 P PpX2q defined as µ2 :" ppX 2 q # µ.
Definition 3 (Disintegration of a Measure).Let f : X Ñ Y be a measurable function and µ P PpXq.We say that a family tµyuyPY is a disintegration of µ according to f if every µy is a probability measure concentrated on f ´1ptyuq such that y Ñ ż X φdµy is a measurable function and for every φ P CpXq.With a slight abuse of notation, the disintegration of µ is also written as µ " µy b ν where ν " f # µ.
The measure µ |x 1 is called conditional law of µ given x1.

The Optimal Transport Problem
The first formulation of the transportation problem is the one due to Monge and, in modern language, to Kantorovich.In [13], the author modelized the transshipment of mass through a probability measure over the product space X ˆY .He called those measures transportation plans.
Definition 4 (Transportation Plan).Let µ and ν be two measures over two Polish spaces X and Y .The probability measure π P PpX ˆY q is a transportation plan between µ and ν if ppX q # π " µ and ppY q # π " ν.
We denote with Πpµ, νq the set of all the transportation plans between µ and ν.
Definition 5 (Transportation Functional).Let µ P PpXq, ν P PpY q, and let c : X ˆY Ñ R Y t`8u be a lower semi-continuous function such that there exist two upper semi-continuous functions a P L 1 µ and b P L 1 ν such that cpx, yq ě apxq `bpxq for each px, yq P X ˆY .The transportation functional Tc : Πpµ, νq Ñ R Y t`8u is defined as The conditions asked to the cost function in Definition 5 are the minimal ones for which it makes sense defining the integral in (7).Ensured those conditions, we define the following minimum problem.

Definition 6 (Minimal Transportation Cost).
Let us take a cost function c : X Ŷ Ñ R Y t`8u as in Definition 5.The minimal transportation cost functional Wc : The value Wcpµ, νq is also called Wasserstein cost between µ and ν.
By making further assumptions on c, it is possible to prove that the infimum in ( 8) is a minimum.In particular, when the cost function is non negative, a minimizing solution exists.We denote with Γopµ, νq the set of minimizers.For a complete discussion on the existence of the solution, we refer to [24,Chapter 4].
Lemma 3 (Measurable Selection of Plans, Villani [24], Chapter 5, Corollary 5.22).Let X and Y be two Polish spaces and c : X ˆY Ñ R Y t`8u a continuous cost function such that inf c ą ´8.Given Ω a Polish space and λ P PpΩq, consider a measurable map ω Ñ pµω, νωq that goes from Ω to PpXq ˆPpY q.Then there is a measurable choice ω Ñ πω where for each ω, πω is the optimal transportation plan between µω and νω.

One Dimensional Case
When both the measures µ and ν are supported on R and the cost function c is convex, the solution exists, is unique, and is characterized by the pseudo-inverse function of µ and ν.
Definition 7. Given µ, ν P PpRq, the co-monotone transportation plan γmon between µ and ν is defined as where and F r´1s ν are the pseudo-inverse of the cumulative functions of µ and ν, respectively, and L |r0,1s is the Lebesgue measure restricted on r0, 1s.
Theorem 1 (Optimality of the co-monotone plan, Santambrogio [21], Chapter 2, Theorem 2.9).Let h : R Ñ R`be a strictly convex function such that hp0q " 0 and µ, ν P PpRq.Consider the cost cpx, yq " hp|x ´y|q and suppose that this cost is feasible for the transportation problem.Then the Optimal Transportation problem has a unique solution which is γmon.
Knowing how to express the optimal transportation plan, allow us to express the Wasserstein cost through the pseudo-inverse of the cumulative functions of µ and ν.

Wasserstein Distance
If we take X " Y and choose d as the cost function, the optimal transportation problem lifts the distance d over the space PppXq.The resulting distance is called the Wasserstein distance.
The p´order Wasserstein distance between the probability measures µ and ν on X is defined as W p d p pµ, νq :" inf πPΠpµ,νq T d p pπq.
When p " 1, the 1´Wasserstein distance is also known as Kantorovich-Rubistein distance.
The Wasserstein distance W p d p is well defined when used to compare measures in PppXq.
Theorem 2. The W d p distance is a finite distance over PppXq.When the set X is bounded, the W d p distance induces the weak topology on the space PppXq.Moreover, pPppXq, W d p q is a Polish space.

Our Contribution
In this section, we report our main results.In paragraph 3.1, we study the properties of the cardinal flows and introduce the pivot measure.Our main result on the structure of the Wasserstein cost is stated in Theorem 4. In paragraph 3.2, we show how to retrieve an optimal transportation plan from an optimal cardinal flow.Moreover, we prove that there exists an optimal transportation plan whose first and last marginal laws are independent given the other two.Finally, in paragraph 3.3, we analyze our formula in two specific frameworks.

The Cardinal Flow and the Pivot Measure Formulation
From now on, we assume X and Y to be the product of smaller Polish spaces, i.e.X " X1 ˆX2 and Y " Y1 ˆY2.In this framework, we can introduce the separable cost function and reformulate the optimal transportation problem as an optimal cardinal flow problem.
Definition 10 (Cardinal Flow).Let us take µ P PpXq and ν P PpY q.We say that the couple of measures pf p1q , f p2q q P PpX ˆY1q ˆPpX2 ˆY q is a cardinal flow between µ and ν if it satisfies the following conditions • The marginal on X of f p1q is equal to µ, i.e.
• The marginal on Y of f p2q is equal to ν, i.e.
• The flows f p1q and f p2q have the same marginal on Y1 ˆX2, i.e.
We call the measures f p1q and f p2q first and second cardinal flow, respectively.We denote with Fpµ, νq the set of all cardinal flows between µ and ν.
Remark 2. For any couple of probability measures µ and ν, the set Fpµ, νq is nonempty.
In fact, the couple pf p1q , f p2q q, defined as is an element of Fpµ, νq.
Definition 11 (Cardinal Flow Functional).Given two probability measures µ P PpXq, ν P PpY q, and a separable cost function c " c1 `c2 over X ˆY .We define the first and second cardinal transportation functionals as where F " pf p1q , f p2q q P Fpµ, νq.The total cardinal flow functional is then defined as CTcpF q " CT p1q c pf p1q q `CT p2q c pf p2q q.
Proof.The proof of this Theorem, in the discrete case, has been proposed in [4].The proof in our generic setting follows from similar arguments.
The function L allows us to relate Tc and CTc, as it follows Tcpπq " CTcpLpπqq, @π P Πpµ, νq, which, in conjunction with the identity LpΠpµ, νqq " Fpµ, νq, allows us to conclude that the infimum of CTc is actually a minimum and that the set of minimizers of CTc coincides with the image of Γopµ, νq through L. In particular, the cardinal flow problem inherits the uniqueness of the solution from the Optimal Transportation problem.
Corollary 2. If the optimal transportation plan is unique, so is the optimal cardinal flow.
Remark 4. Since the operator L is only surjective and not injective, the reverse implication is not true, i.e., given an optimal cardinal flow pf p1q , f p2q q, there might exist a plethora of optimal transportation plans such that Lpπq " pf p1q , f p2q q.
Definition 14 (Pivot Measure).Let us take µ P PpXq, ν P PpY q, and c a separable cost function.We say that ζ P PpY1 ˆX2q is a pivot measure between µ and ν if there exists an optimal cardinal flow pf p1q , f p2q q that glues on it.
Remark 5. From Lemma 1, we have that all the pivot measures are also intermediate measures.
Theorem 4. Let us take µ P PpXq, ν P PpY q, and c " c1 `c2 a separable cost function.For any pivot measure ζ, it holds true the formula Wcpµ, νq " Similarly, we decompose µ and ν as respectively.For µ2´almost every x2 P X2 is then well defined the problem Theorem 3 assures us that the selections x2 Ñ µ |x 2 and x2 Ñ σ p1q |x 2 are both measurable, hence, according to Lemma 3, there exists a measurable selection of optimal plans π |x 2 for which holds true µ2´almost everywhere x2 P X2.Similarly, there exists a measurable selection π |y 1 for which, for ν1´almost every y1 P Y1, holds true Let us now consider the measures f p1q P PpX ˆY1q and f p2q P PpX2 ˆY q, defined as it follows The couple pf p1q , f p2q q is a cardinal flow between µ and ν, in fact, given an φ P L 1 µ , we have hence, ppX q # f p1q " µ.Similarly, we get hence pf p1q , f p2q q P Fpµ, νq.From the identities ( 16) and ( 17), we have To prove the other inequality, let us take pf p1q , f p2q q P Fpµ, νq.By definition, we have and ppY 1 q # f p2q " ppY 1 q # ν " ν1.
By disintegrating f p1q and f p2q with respect to the variable x2 and y1, respectively, we find f p1q " ψ |x 2 b µ2 and f p2q " φ |y 1 b ν1.
From formula (14), we deduce that the computation of the Wasserstein cost between two generic measures can be achieved in two steps: detecting the pivotal measure and solving a family of lower dimensional problems.In particular, if we are able to solve the lower dimensional transportation problems, the only unknown left to determine is the pivot measure.
To conclude we just need to prove the other inequality.
Let us fix ζ P Ipµ, νq.Following the steps of the proof of Theorem 4, we disintegrate µ, ζ, and ν (see ( 14)-( 15)), find the optimal transportation plans between the conditional measures, and define the cardinal flow as done in (18).Since the couple pf p1q , f p2q q P Fpµ, νq, we have

Independence of the Optimal Coupling
As we noticed in Remark 4, from an optimal transportation plan we determine one optimal cardinal flow, however the vice versa is not true.The next example showcases how, even if we have a unique pivot measure and a unique optimal cardinal flow, we might retrieve an infinity of optimal transportation plans.
In particular, the transportation plan minimizes Tc.
As a straightforward consequence we get the following.Theorem 6.Let µ P PpXq, ν P PpY q, and c " c1 `c2 a separable cost function.Let us assume the transportation problem between µ and ν has a unique solution π.Then, if pX1, X2, Y1, Y2q is the coupling inducing the law π, we have 1.X1 and Y2 are conditionally independent given X2 and Y1, 2. X2 and Y1 are conditionally independent given X1 and Y2.
Proof.The first two statements follow from the fact that Ipµ, νq contains only one measure, which is ν1 b δx 2 in the first case and δȳ 1 b µ2 in the second one.If both (25) and (26) hold, also Fpµ, νq contains only one element, pf p1q , f p2q q " pµ b δȳ 1 , δx 2 b νq.

Two Specific Frameworks
To conclude, we inhabit our studies in two specific frameworks.In the first one, the cost function is a separable distance, i.e. the sum of two distances.In the second one, the measures are supported over R 2 and the cost function has the form (2).

Separable Distances
When Since X " Y , we need to slightly change the notations in order to avoid confusion.We denote the generic point px p1q , x p2q q P X ˆX with px p1q , x p2q q " `px ˘, hence, we denote with x piq the i´th component in the space X ˆX and we denote with x piq j the j´th component of x piq .The projections ppq piq : X ˆX Ñ X and p piq j : X ˆX Ñ Xj are defined as ppq piq px p1q , x p2q q :" x piq and p piq j px p1q , x p2q q :" x piq j for i " 1, 2 and j " 1, 2. In particular, we have ppq piq " pp piq 1 , p piq 2 q.
Theorem 7. Let us take µ, ν P PpXq, where X " X1 ˆX2, and let d be a separable distance over X, i.e.
To prove the other inequality, let us take pf p1q , f p2q q P Fpµ, νq an optimal cardinal flow.We define π p1q " `ppq p1q , pp However, the reverse implication is not true.Let us consider, for instance, X " R

Cardinal Flow in the Euclidean Space
Let us now consider X " Y " R 2 and a separable cost function c " c1 `c2, with cipx, yq " hp|xi ´yi|q, where h : R Ñ r0, 8q is a convex function such that hp0q " 0. Due to the structure of ci, Corollary 1 allows us to rewrite (13) in terms of pseudo-inverse functions.Theorem 9 becomes particularly useful when we take the squared Euclidean distance as cost function, i.e.
Indeed, since c is invariant under isometries, Theorem 9 generalizes to any µ supported on a straight line.
Since pO p´1q q # µ satisfies the hypothesis of Theorem 9, we conclude the thesis.

Figure 1 :
Figure 1: The lack of uniqueness showcased in Example 1.The support of µ is indicated by light grey circles, the support of ν by dark grey circles.In Figure (a), we showcase the optimal cardinal flow.The support of the pivot measure ζ is indicated by the black triangle.In Figure (b), (c), and (d) are showcased the transportation plans π 1 , π 2 , and π, respectively.Every of those transportation plans induces the cardinal flow described in (a).
X " Y and we choose c " d as a cost function, Theorem 2 states that the Optimal Transportation problem lifts the distance structure from X to PppXq.When d is separable, the induced distance W d inherits a weaker version of the separability, i.e.W d pµ, νq " W d pµ, ζq `Wd pζ, νq, for any pivot measure ζ.