In this paper, we study the stochastic optimization problem from a continuous-time perspective. We propose a stochastic first-order algorithm, called Stochastic Gradient Descent with Momentum (SGDM), and show that the trajectory of SGDM, despite its stochastic nature, converges to a deterministic second-order Ordinary Differential Equation (ODE) in \(L_2\)-norm, as the stepsize goes to zero. The connection between the ODE and the algorithm results in delightful patterns in the discrete-time convergence analysis. More specifically, we develop convergence results for the ODE through a Lyapunov function, and translate the whole argument to the discrete-time case. This approach yields a novel anytime convergence guarantee for stochastic gradient methods. Precisely, we prove that, for any \(β\), there exists \(k_0\)such that the sequence \({x_k}\)governed by running SGDM on a smooth convex function \(f\)satisfies \[ \bbP\Biggl(\mathop⋂_{k=k_0}^∞\biggl\{f (x_k) - f^* ≤\frac{C\log k \log(2/β)}{\sqrt{k}}\biggr\}\Biggr)≥1-β,\]where \(f^*=\min_{x∈\bbR^n}f(x)\). Our contribution is significant in that it better captures the convergence behavior across the entire trajectory of the algorithm, rather than at a single iterate.
Duality of Causal Distributionally Robust Optimization: The Discrete-Time Case
This paper studies distributionally robust optimization (DRO) in a dynamic context. We consider a general penalized DRO problem with a causal transport-type penalization. Such a penalization naturally captures the information flow generated by the dynamic model. We derive a tractable dynamic duality formula under mild conditions. Furthermore, we apply this duality formula to address distributionally robust version of average value-at-risk, stochastic control, and optimal stopping.
2023
Wasserstein Distributional Robustness of Neural Networks
Xingjian Bai, Guangyi He, Yifan Jiang, and Jan Obłój
In Advances in Neural Information Processing Systems , Dec 2023
Deep neural networks are known to be vulnerable to adversarial attacks (AA). For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified. Design of such attacks as well as methods of adversarial training against them are subject of intense research. We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions leveraging recent insights from DRO sensitivity analysis. We consider a set of distributional threat models. Unlike the traditional pointwise attacks, which assume a uniform bound on perturbation of each input data point, distributional threat models allow attackers to perturb inputs in a non-uniform way. We link these more general attacks with questions of out-of-sample performance and Knightian uncertainty. To evaluate the distributional robustness of neural networks, we propose a first-order AA algorithm and its multistep version. Our attack algorithms include Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) as special cases. Furthermore, we provide a new asymptotic estimate of the adversarial accuracy against distributional threat models. The bound is fast to compute and first-order accurate, offering new insights even for the pointwise AA. It also naturally yields out-of-sample performance guarantees. We conduct numerical experiments on CIFAR-10, CIFAR-100, ImageNet datasets using DNNs on RobustBench to illustrate our theoretical results. Our code is available at here.
Empirical Approximation to Invariant Measures for McKean–Vlasov Processes: Mean-field Interaction vs Self-interaction
This paper proves that, under a monotonicity condition, the invariant probability measure of a McKean–Vlasov process can be approximated by weighted empirical measures of some processes including itself. These processes are described by distribution dependent or empirical measure dependent stochastic differential equations constructed from the equation for the McKean–Vlasov process. Convergence of empirical measures is characterized by upper bound estimates for their Wasserstein distances to the invariant measure. Numerical simulations of the mean-field Ornstein–Uhlenbeck process are implemented to demonstrate the theoretical results.
A new class of particle systems with sequential interaction is proposed to approximate the McKean-Vlasov process that originally arises as the limit of the mean-field interacting particle system. The weighted empirical measure of this particle system is proved to converge to the law of the McKean-Vlasov process as the system grows. Based on the Wasserstein metric, quantitative propagation of chaos results are obtained for two cases: the finite time estimates under the monotonicity condition and the uniform in time estimates under the dissipation and the non-degenerate conditions. Numerical experiments are implemented to demonstrate the theoretical results.
2022
Existence and Distributional Chaos of Points that are Recurrent but Not Banach Recurrent
Yifan Jiang, and Xueting Tian
Journal of Dynamics and Differential Equations, Apr 2022
According to the recurrent frequency, many levels of recurrent points are found such as periodic points, almost periodic points, weakly almost periodic points, quasi-weakly almost periodic points and Banach recurrent points. In this paper, we consider symbolic dynamics and show the existence of six refined levels between Banach recurrence and general recurrence. Despite the fact that these refined levels are all null-measure under any invariant measure, we further show they carry strong topological complexity. Each refined level of recurrent points is dense in the whole space and contains an uncountable distributionally chaotic subset. For a wide range of dynamical systems such as expansive systems with the shadowing property, we also show the distributional chaos of the points that are recurrent but not Banach recurrent.
2021
Convergence of the Deep BSDE Method for FBSDEs with Non-Lipschitz Coefficients
Yifan Jiang, and Jinfeng Li
Probability, Uncertainty and Quantitative Risk, Dec 2021
This paper is dedicated to solving high-dimensional coupled FBSDEs with non-Lipschitz diffusion coefficients numerically. Under mild conditions, we provided a posterior estimate of the numerical solution that holds for any time duration. This posterior estimate validates the convergence of the recently proposed Deep BSDE method. In addition, we developed a numerical scheme based on the Deep BSDE method and presented numerical examples in financial markets to demonstrate the high performance.