Papers | Yifan Jiang

2025

A Transfer Principle for Computing the Adapted Wasserstein Distance between Stochastic Processes

Yifan Jiang, and Fang Rui Lim

May 2025

Abs arXiv Bib

We propose a transfer principle to study the adapted 2-Wasserstein distance between stochastic processes. First, we obtain an explicit formula for the distance between real-valued mean-square continuous Gaussian processes by introducing the causal factorization as an infinite-dimensional analogue of the Cholesky decomposition for operators on Hilbert spaces. We discuss the existence and uniqueness of this causal factorization and link it to the canonical representation of Gaussian processes. As a byproduct, we characterize mean-square continuous Gaussian Volterra processes in terms of their natural filtrations. Moreover, for real-valued fractional stochastic differential equations, we show that the synchronous coupling between the driving fractional noises attains the adapted Wasserstein distance under some monotonicity conditions. Our results cover a wide class of stochastic processes which are neither Markov processes nor semi-martingales, including fractional Brownian motions and fractional Ornstein–Uhlenbeck processes.
@misc{jiang25transfer, title = {A Transfer Principle for Computing the Adapted {{Wasserstein}} Distance between Stochastic Processes}, author = {Jiang, Yifan and Lim, Fang Rui}, year = {2025}, month = may, number = {arXiv:2505.21337}, eprint = {2505.21337}, primaryclass = {math}, publisher = {arXiv}, }
Wasserstein Distributional Adversarial Training for Deep Neural Networks

Xingjian Bai, Guangyi He, Yifan Jiang, and Jan Obłój

Feb 2025

Abs arXiv Bib

Design of adversarial attacks for deep neural networks, as well as methods of adversarial training against them, are subject of intense research. In this paper, we propose methods to train against distributional attack threats, extending the TRADES method used for pointwise attacks. Our approach leverages recent contributions and relies on sensitivity analysis for Wasserstein distributionally robust optimization problems. We introduce an efficient fine-tuning method which can be deployed on a previously trained model. We test our methods on a range of pre-trained models on RobustBench. These experimental results demonstrate the additional training enhances Wasserstein distributional robustness, while maintaining original levels of pointwise robustness, even for already very successful networks. The improvements are less marked for models pre-trained using huge synthetic datasets of 20-100M images. However, remarkably, sometimes our methods are still able to improve their performance even when trained using only the original training dataset (50k images).
@misc{bai25Wasserstein, title = {Wasserstein Distributional Adversarial Training for Deep Neural Networks}, author = {Bai, Xingjian and He, Guangyi and Jiang, Yifan and Obłój, Jan}, year = {2025}, month = feb, number = {arXiv:2502.09352}, eprint = {2502.09352}, primaryclass = {cs}, publisher = {arXiv}, }

2024

Sensitivity of Causal Distributionally Robust Optimization

Yifan Jiang, and Jan Obłój

Aug 2024

Abs arXiv Bib

We study the causal distributionally robust optimization (DRO) in both discrete- and continuous- time settings. The framework captures model uncertainty, with potential models penalized in function of their adapted Wasserstein distance to a given reference model. Strength of the penalty is controlled using a real-valued parameter which, in the special case of an indicator penalty, is simply the radius of the uncertainty ball. Our main results derive the first-order sensitivity of the value of causal DRO with respect to the penalization parameter, i.e., we compute the sensitivity to model uncertainty. Moreover, we investigate the case where a martingale constraint is imposed on the underlying model, as is the case for pricing measures in mathematical finance. We introduce different scaling regimes, which allow us to obtain the continuous-time sensitivities as nontrivial limits of their discrete-time counterparts. We illustrate our results with examples. The sensitivities are naturally expressed using optional projections of Malliavin derivatives. To establish our results we obtain several novel results which are of independent interest. In particular, we introduce pathwise Malliavin derivatives and show these extend the classical notion. We also establish a novel stochastic Fubini theorem.
@misc{jiang25Sensitivity, title = {Sensitivity of Causal Distributionally Robust Optimization}, author = {Jiang, Yifan and Obłój, Jan}, year = {2024}, month = aug, number = {arXiv:2408.17109}, eprint = {2408.17109}, primaryclass = {math}, publisher = {arXiv}, }
The Anytime Convergence of Stochastic Gradient Descent with Momentum: From a Continuous-Time Perspective

Yasong Feng, Yifan Jiang, Tianyu Wang, and Zhiliang Ying

Feb 2024

Abs arXiv Bib

In this paper, we study the stochastic optimization problem from a continuous-time perspective. We propose a stochastic first-order algorithm, called Stochastic Gradient Descent with Momentum (SGDM), and show that the trajectory of SGDM, despite its stochastic nature, converges to a deterministic second-order Ordinary Differential Equation (ODE) in \(L_2\)-norm, as the stepsize goes to zero. The connection between the ODE and the algorithm results in delightful patterns in the discrete-time convergence analysis. More specifically, we develop convergence results for the ODE through a Lyapunov function, and translate the whole argument to the discrete-time case. This approach yields a novel anytime convergence guarantee for stochastic gradient methods. Precisely, we prove that, for any \(β\), there exists \(k_0\)such that the sequence \({x_k}\)governed by running SGDM on a smooth convex function \(f\)satisfies \[ \bbP\Biggl(\mathop⋂_{k=k_0}^∞\biggl\{f (x_k) - f^* ≤\frac{C\log k \log(2/β)}{\sqrt{k}}\biggr\}\Biggr)≥1-β,\]where \(f^*=\min_{x∈\bbR^n}f(x)\). Our contribution is significant in that it better captures the convergence behavior across the entire trajectory of the algorithm, rather than at a single iterate.
@misc{FJWY24Anytime, title = {The {{Anytime Convergence}} of {{Stochastic Gradient Descent}} with {{Momentum}}: {{From}} a {{Continuous-Time Perspective}}}, shorttitle = {The {{Anytime Convergence}} of {{Stochastic Gradient Descent}} with {{Momentum}}}, author = {Feng, Yasong and Jiang, Yifan and Wang, Tianyu and Ying, Zhiliang}, year = {2024}, month = feb, number = {arXiv:2310.19598}, eprint = {2310.19598}, publisher = {{arXiv}}, }
Duality of Causal Distributionally Robust Optimization: The Discrete-Time Case

Yifan Jiang

Jan 2024

Abs arXiv Bib

This paper studies distributionally robust optimization (DRO) in a dynamic context. We consider a general penalized DRO problem with a causal transport-type penalization. Such a penalization naturally captures the information flow generated by the dynamic model. We derive a tractable dynamic duality formula under mild conditions. Furthermore, we apply this duality formula to address distributionally robust version of average value-at-risk, stochastic control, and optimal stopping.
@misc{jiang24Duality, title = {Duality of Causal Distributionally Robust Optimization: The Discrete-Time Case}, shorttitle = {Duality of Causal Distributionally Robust Optimization}, author = {Jiang, Yifan}, year = {2024}, month = jan, number = {arXiv:2401.16556}, eprint = {2401.16556}, publisher = {{arXiv}}, }

2023

Wasserstein Distributional Robustness of Neural Networks

Xingjian Bai, Guangyi He, Yifan Jiang, and Jan Obłój

In Advances in Neural Information Processing Systems, Dec 2023

Abs arXiv Bib Code

Deep neural networks are known to be vulnerable to adversarial attacks (AA). For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified. Design of such attacks as well as methods of adversarial training against them are subject of intense research. We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions leveraging recent insights from DRO sensitivity analysis. We consider a set of distributional threat models. Unlike the traditional pointwise attacks, which assume a uniform bound on perturbation of each input data point, distributional threat models allow attackers to perturb inputs in a non-uniform way. We link these more general attacks with questions of out-of-sample performance and Knightian uncertainty. To evaluate the distributional robustness of neural networks, we propose a first-order AA algorithm and its multistep version. Our attack algorithms include Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) as special cases. Furthermore, we provide a new asymptotic estimate of the adversarial accuracy against distributional threat models. The bound is fast to compute and first-order accurate, offering new insights even for the pointwise AA. It also naturally yields out-of-sample performance guarantees. We conduct numerical experiments on CIFAR-10, CIFAR-100, ImageNet datasets using DNNs on RobustBench to illustrate our theoretical results. Our code is available at here.
@inproceedings{BHJO23Wasserstein, title = {Wasserstein Distributional Robustness of Neural Networks}, author = {Bai, Xingjian and He, Guangyi and Jiang, Yifan and Obłój, Jan}, year = {2023}, volume = {36}, pages = {26322--26347}, month = dec, booktitle = {Advances in Neural Information Processing Systems}, publisher = {Curran Associates, Inc.}, }
Empirical Approximation to Invariant Measures for McKean–Vlasov Processes: Mean-field Interaction vs Self-interaction

Kai Du, Yifan Jiang, and Jinfeng Li

Bernoulli, Aug 2023

Abs arXiv Bib

This paper proves that, under a monotonicity condition, the invariant probability measure of a McKean–Vlasov process can be approximated by weighted empirical measures of some processes including itself. These processes are described by distribution dependent or empirical measure dependent stochastic differential equations constructed from the equation for the McKean–Vlasov process. Convergence of empirical measures is characterized by upper bound estimates for their Wasserstein distances to the invariant measure. Numerical simulations of the mean-field Ornstein–Uhlenbeck process are implemented to demonstrate the theoretical results.
@article{DJL23Empirical, title = {Empirical Approximation to Invariant Measures for {McKean--Vlasov} Processes: Mean-field Interaction vs Self-interaction}, author = {Du, Kai and Jiang, Yifan and Li, Jinfeng}, year = {2023}, month = aug, journal = {Bernoulli}, volume = {29}, number = {3}, pages = {2492--2518}, publisher = {Bernoulli Society for Mathematical Statistics and Probability}, }
Sequential Propagation of Chaos

Kai Du, Yifan Jiang, and Xiaochen Li

Jan 2023

Abs arXiv Bib

A new class of particle systems with sequential interaction is proposed to approximate the McKean-Vlasov process that originally arises as the limit of the mean-field interacting particle system. The weighted empirical measure of this particle system is proved to converge to the law of the McKean-Vlasov process as the system grows. Based on the Wasserstein metric, quantitative propagation of chaos results are obtained for two cases: the finite time estimates under the monotonicity condition and the uniform in time estimates under the dissipation and the non-degenerate conditions. Numerical experiments are implemented to demonstrate the theoretical results.
@misc{DJL23Sequential, title = {Sequential Propagation of Chaos}, author = {Du, Kai and Jiang, Yifan and Li, Xiaochen}, year = {2023}, month = jan, number = {arXiv:2301.09913}, eprint = {2301.09913}, publisher = {{arXiv}}, }

2022

Existence and Distributional Chaos of Points that are Recurrent but Not Banach Recurrent

Yifan Jiang, and Xueting Tian

Journal of Dynamics and Differential Equations, Apr 2022

Abs arXiv Bib

According to the recurrent frequency, many levels of recurrent points are found such as periodic points, almost periodic points, weakly almost periodic points, quasi-weakly almost periodic points and Banach recurrent points. In this paper, we consider symbolic dynamics and show the existence of six refined levels between Banach recurrence and general recurrence. Despite the fact that these refined levels are all null-measure under any invariant measure, we further show they carry strong topological complexity. Each refined level of recurrent points is dense in the whole space and contains an uncountable distributionally chaotic subset. For a wide range of dynamical systems such as expansive systems with the shadowing property, we also show the distributional chaos of the points that are recurrent but not Banach recurrent.
@article{JT22Existence, title = {Existence and Distributional Chaos of Points that are Recurrent but Not {Banach} Recurrent}, author = {Jiang, Yifan and Tian, Xueting}, year = {2022}, month = apr, journal = {Journal of Dynamics and Differential Equations}, issn = {1572-9222}, doi = {10.1007/s10884-022-10158-x}, }

2021

Convergence of the Deep BSDE Method for FBSDEs with Non-Lipschitz Coefficients

Yifan Jiang, and Jinfeng Li

Probability, Uncertainty and Quantitative Risk, Dec 2021

Abs arXiv Bib

This paper is dedicated to solving high-dimensional coupled FBSDEs with non-Lipschitz diffusion coefficients numerically. Under mild conditions, we provided a posterior estimate of the numerical solution that holds for any time duration. This posterior estimate validates the convergence of the recently proposed Deep BSDE method. In addition, we developed a numerical scheme based on the Deep BSDE method and presented numerical examples in financial markets to demonstrate the high performance.
@article{JL21Convergence, title = {Convergence of the {Deep BSDE} Method for {FBSDEs} with Non-{Lipschitz} Coefficients}, author = {Jiang, Yifan and Li, Jinfeng}, year = {2021}, month = dec, journal = {Probability, Uncertainty and Quantitative Risk}, volume = {6}, number = {4}, pages = {391--408}, publisher = {American Institute of Mathematical Sciences}, }