Optimization and Control
See recent articles
Showing new listings for Thursday, 21 November 2024
- [1] arXiv:2411.12898 [pdf, other]
-
Title: Problem-dependent convergence bounds for randomized linear gradient compressionComments: 15 pages, 3 figuresSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
In distributed optimization, the communication of model updates can be a performance bottleneck. Consequently, gradient compression has been proposed as a means of increasing optimization throughput. In general, due to information loss, compression introduces a penalty on the number of iterations needed to reach a solution. In this work, we investigate how the iteration penalty depends on the interaction between compression and problem structure, in the context of non-convex stochastic optimization. We focus on linear compression schemes, where compression and decompression can be modeled as multiplication with a random matrix. We consider several distributions of matrices, among them random orthogonal matrices and matrices with random Gaussian entries. We find that in each case, the impact of compression on convergence can be quantified in terms of the norm of the Hessian of the objective, using a norm defined by the compression scheme. The analysis reveals that in certain cases, compression performance is related to low-rank structure or other spectral properties of the problem. In these cases, our bounds predict that the penalty introduced by compression is significantly reduced compared to worst-case bounds that only consider the compression level, ignoring problem data. We verify the theoretical findings on several optimization problems, including fine-tuning an image classification model.
- [2] arXiv:2411.13111 [pdf, html, other]
-
Title: Optimal investment problem of a renewal risk model with generalized erlang distributed interarrival timesSubjects: Optimization and Control (math.OC)
This paper explores the optimal investment problem of a renewal risk model with generalized Erlang distributed interarrival times. We assume that the phases of the interarrival time can be observed. The price of the risky asset is driven by the CEV model and the insurer aims to maximize the exponential utility of the terminal wealth by asset allocation. By solving the corresponding Hamilton-Jacobi-Bellman equation, when the interest rate is zero, the concavity of the solution as well as the the explicit expression of the investment policy is shown. When the interest rate is not zero, the explicit expression of the optimal investment strategy is shown, the structure as well as the concavity of the value function is proved.
- [3] arXiv:2411.13151 [pdf, html, other]
-
Title: Enhancements of Fragment Based Algorithms for Vehicle Routing ProblemsSubjects: Optimization and Control (math.OC)
The method of fragments was recently proposed, and its effectiveness has been empirically shown for three specialised pickup and delivery problems. We propose an enhanced fragment algorithm that for the first time, effectively solves the Pickup and Delivery Problem with Time Windows. Additionally, we describe the approach in general terms to exemplify its theoretical applicability to vehicle routing problems without pickup and delivery requirements. We then apply it to the Truck-Based Drone Delivery Routing Problem Problem with Time Windows. The algorithm uses a fragment formulation rather than a route one. The definition of a fragment is problem specific, but generally, they can be thought of as enumerable segments of routes with a particular structure. A resource expanded network is constructed from the fragments and is iteratively updated via dynamic discretization discovery. Additionally, we introduce two new concepts called formulation leveraging and column enumeration for row elimination that are crucial for solving difficult problems. These use the strong linear relaxation of the route formulation to strengthen the fragment formulation. We test our algorithm on instances of the Pickup and Delivery Problem with Time Windows and the Truck-Based Drone Delivery Routing Problem with Time Windows. Our approach is competitive with, or outperforms the state-of-the-art algorithm for both.
- [4] arXiv:2411.13219 [pdf, html, other]
-
Title: Backward Stochastic Control System with Entropy RegularizationSubjects: Optimization and Control (math.OC)
The entropy regularization is inspired by information entropy from machine learning and the ideas of exploration and exploitation in reinforcement learning, which appears in the control problem to design an approximating algorithm for the optimal control. This paper is concerned with the optimal exploratory control for backward stochastic system, generated by the backward stochastic differential equation and with the entropy regularization in its cost functional. We give the theoretical depict of the optimal relaxed control so as to lay the foundation for the application of such a backward stochastic control system to mathematical finance and algorithm implementation. For this, we first establish the stochastic maximum principle by convex variation method. Then we prove sufficient condition for the optimal control and demonstrate the implicit form of optimal control. Finally, the existence and uniqueness of the optimal control for backward linear-quadratic control problem with entropy regularization is proved by decoupling techniques.
- [5] arXiv:2411.13234 [pdf, html, other]
-
Title: Extremum and Nash Equilibrium Seeking with Delays and PDEs: Designs & ApplicationsComments: Preprint submitted to IEEE Control Systems Magazine (Special Issue: Into the Second Century of Extremum Seeking Control, 38 pages and 34 figures)Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
The development of extremum seeking (ES) has progressed, over the past hundred years, from static maps, to finite-dimensional dynamic systems, to networks of static and dynamic agents. Extensions from ODE dynamics to maps and agents that incorporate delays or even partial differential equations (PDEs) is the next natural step in that progression through ascending research challenges. This paper reviews results on algorithm design and theory of ES for such infinite-dimensional systems. Both hyperbolic and parabolic dynamics are presented: delays or transport equations, heat-dominated equation, wave equations, and reaction-advection-diffusion equations. Nash equilibrium seeking (NES) methods are introduced for noncooperative game scenarios of the model-free kind and then specialized to single-agent optimization. Even heterogeneous PDE games, such as a duopoly with one parabolic and one hyperbolic agent, are considered. Several engineering applications are touched upon for illustration, including flow-traffic control for urban mobility, oil-drilling systems, deep-sea cable-actuated source seeking, additive manufacturing modeled by the Stefan PDE, biological reactors, light-source seeking with flexible-beam structures, and neuromuscular electrical stimulation.
- [6] arXiv:2411.13267 [pdf, html, other]
-
Title: ripALM: A Relative-Type Inexact Proximal Augmented Lagrangian Method with Applications to Quadratically Regularized Optimal TransportSubjects: Optimization and Control (math.OC)
Inexact proximal augmented Lagrangian methods (pALMs) are particularly appealing for tackling convex constrained optimization problems because of their elegant convergence properties and strong practical performance. To solve the associated pALM subproblems, efficient methods such as Newton-type methods are essential. Consequently, the effectiveness of the inexact pALM hinges on the error criteria used to control the inexactness when solving these subproblems. However, existing inexact pALMs either rely on absolute-type error criteria (which may complicate implementation by necessitating the pre-specification of an infinite sequence of error tolerance parameters) or require an additional correction step when using relative error criteria (which can potentially slow down the convergence of the pALM). To address this deficiency, this paper proposes ripALM, a relative-type inexact pALM, which can simplify practical implementation while preserving the appealing convergence properties of the classical absolute-type inexact pALM. We emphasize that ripALM is the first relative-type inexact version of the vanilla pALM with provable convergence guarantees. Numerical experiments on quadratically regularized optimal transport (OT) problems demonstrate the competitive efficiency of the proposed method compared to existing methods. As our analysis can be extended to a more general convex constrained problem setting, including other regularized OT problems, the proposed ripALM may provide broad applicability and has the potential to serve as a basic optimization tool.
- [7] arXiv:2411.13276 [pdf, html, other]
-
Title: Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play AlgorithmsSubjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
In this work we study the behavior of the forward-backward (FB) algorithm when the proximity operator is replaced by a sub-iterative procedure to approximate a Gaussian denoiser, in a Plug-and-Play (PnP) fashion. In particular, we consider both analysis and synthesis Gaussian denoisers within a dictionary framework, obtained by unrolling dual-FB iterations or FB iterations, respectively. We analyze the associated minimization problems as well as the asymptotic behavior of the resulting FB-PnP iterations. In particular, we show that the synthesis Gaussian denoising problem can be viewed as a proximity operator. For each case, analysis and synthesis, we show that the FB-PnP algorithms solve the same problem whether we use only one or an infinite number of sub-iteration to solve the denoising problem at each iteration. To this aim, we show that each "one sub-iteration" strategy within the FB-PnP can be interpreted as a primal-dual algorithm when a warm-restart strategy is used. We further present similar results when using a Moreau-Yosida smoothing of the global problem, for an arbitrary number of sub-iterations. Finally, we provide numerical simulations to illustrate our theoretical results. In particular we first consider a toy compressive sensing example, as well as an image restoration problem in a deep dictionary framework.
- [8] arXiv:2411.13394 [pdf, html, other]
-
Title: CB$^2$O: Consensus-Based Bi-Level OptimizationSubjects: Optimization and Control (math.OC); Analysis of PDEs (math.AP)
Bi-level optimization problems, where one wishes to find the global minimizer of an upper-level objective function over the globally optimal solution set of a lower-level objective, arise in a variety of scenarios throughout science and engineering, machine learning, and artificial intelligence. In this paper, we propose and investigate, analytically and experimentally, consensus-based bi-level optimization (CB$^2$O), a multi-particle metaheuristic derivative-free optimization method designed to solve bi-level optimization problems when both objectives may be nonconvex. Our method leverages within the computation of the consensus point a carefully designed particle selection principle implemented through a suitable choice of a quantile on the level of the lower-level objective, together with a Laplace principle-type approximation w.r.t. the upper-level objective function, to ensure that the bi-level optimization problem is solved in an intrinsic manner. We give an existence proof of solutions to a corresponding mean-field dynamics, for which we first establish the stability of our consensus point w.r.t. a combination of Wasserstein and $L^2$ perturbations, and consecutively resort to PDE considerations extending the classical Picard iteration to construct a solution. For such solution, we provide a global convergence analysis in mean-field law showing that the solution of the associated nonlinear nonlocal Fokker-Planck equation converges exponentially fast to the unique solution of the bi-level optimization problem provided suitable choices of the hyperparameters. The practicability and efficiency of our CB$^2$O algorithm is demonstrated through extensive numerical experiments in the settings of constrained global optimization, sparse representation learning, and robust (clustered) federated learning.
New submissions (showing 8 of 8 entries)
- [9] arXiv:2411.04293 (cross-list from cs.AI) [pdf, html, other]
-
Title: A Random-Key Optimizer for Combinatorial OptimizationAntonio A. Chaves, Mauricio G.C. Resende, Martin J.A. Schuetz, J. Kyle Brubaker, Helmut G. Katzgraber, Edilson F. de Arruda, Ricardo M. A. SilvaComments: 54 pages, 16 figures, 8 tablesSubjects: Artificial Intelligence (cs.AI); Disordered Systems and Neural Networks (cond-mat.dis-nn); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
This paper presents the Random-Key Optimizer (RKO), a versatile and efficient stochastic local search method tailored for combinatorial optimization problems. Using the random-key concept, RKO encodes solutions as vectors of random keys that are subsequently decoded into feasible solutions via problem-specific decoders. The RKO framework is able to combine a plethora of classic metaheuristics, each capable of operating independently or in parallel, with solution sharing facilitated through an elite solution pool. This modular approach allows for the adaptation of various metaheuristics, including simulated annealing, iterated local search, and greedy randomized adaptive search procedures, among others. The efficacy of the RKO framework, implemented in C++, is demonstrated through its application to three NP-hard combinatorial optimization problems: the alpha-neighborhood p-median problem, the tree of hubs location problem, and the node-capacitated graph partitioning problem. The results highlight the framework's ability to produce high-quality solutions across diverse problem domains, underscoring its potential as a robust tool for combinatorial optimization.
- [10] arXiv:2411.12786 (cross-list from stat.ML) [pdf, html, other]
-
Title: Off-policy estimation with adaptively collected data: the power of online learningComments: 37 pages. Accepted to the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, British Columbia, CanadaSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
We consider estimation of a linear functional of the treatment effect using adaptively collected data. This task finds a variety of applications including the off-policy evaluation (\textsf{OPE}) in contextual bandits, and estimation of the average treatment effect (\textsf{ATE}) in causal inference. While a certain class of augmented inverse propensity weighting (\textsf{AIPW}) estimators enjoys desirable asymptotic properties including the semi-parametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first establish generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we also propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning to minimize the sequentially weighted estimation error. To illustrate this, we provide three concrete instantiations in (\romannumeral 1) the tabular case; (\romannumeral 2) the case of linear function approximation; and (\romannumeral 3) the case of general function approximation for the outcome model. We then provide a local minimax lower bound to show the instance-dependent optimality of the \textsf{AIPW} estimator using no-regret online learning algorithms.
- [11] arXiv:2411.12995 (cross-list from stat.ML) [pdf, html, other]
-
Title: Eliminating Ratio Bias for Gradient-based Simulated Parameter EstimationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
This article addresses the challenge of parameter calibration in stochastic models where the likelihood function is not analytically available. We propose a gradient-based simulated parameter estimation framework, leveraging a multi-time scale algorithm that tackles the issue of ratio bias in both maximum likelihood estimation and posterior density estimation problems. Additionally, we introduce a nested simulation optimization structure, providing theoretical analyses including strong convergence, asymptotic normality, convergence rate, and budget allocation strategies for the proposed algorithm. The framework is further extended to neural network training, offering a novel perspective on stochastic approximation in machine learning. Numerical experiments show that our algorithm can improve the estimation accuracy and save computational costs.
- [12] arXiv:2411.13083 (cross-list from cs.LG) [pdf, html, other]
-
Title: Omnipredicting Single-Index Models with Multi-Index ModelsSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
Recent work on supervised learning [GKR+22] defined the notion of omnipredictors, i.e., predictor functions $p$ over features that are simultaneously competitive for minimizing a family of loss functions $\mathcal{L}$ against a comparator class $\mathcal{C}$. Omniprediction requires approximating the Bayes-optimal predictor beyond the loss minimization paradigm, and has generated significant interest in the learning theory community. However, even for basic settings such as agnostically learning single-index models (SIMs), existing omnipredictor constructions require impractically-large sample complexities and runtimes, and output complex, highly-improper hypotheses.
Our main contribution is a new, simple construction of omnipredictors for SIMs. We give a learner outputting an omnipredictor that is $\varepsilon$-competitive on any matching loss induced by a monotone, Lipschitz link function, when the comparator class is bounded linear predictors. Our algorithm requires $\approx \varepsilon^{-4}$ samples and runs in nearly-linear time, and its sample complexity improves to $\approx \varepsilon^{-2}$ if link functions are bi-Lipschitz. This significantly improves upon the only prior known construction, due to [HJKRR18, GHK+23], which used $\gtrsim \varepsilon^{-10}$ samples.
We achieve our construction via a new, sharp analysis of the classical Isotron algorithm [KS09, KKKS11] in the challenging agnostic learning setting, of potential independent interest. Previously, Isotron was known to properly learn SIMs in the realizable setting, as well as constant-factor competitive hypotheses under the squared loss [ZWDD24]. As they are based on Isotron, our omnipredictors are multi-index models with $\approx \varepsilon^{-2}$ prediction heads, bringing us closer to the tantalizing goal of proper omniprediction for general loss families and comparators. - [13] arXiv:2411.13169 (cross-list from cs.LG) [pdf, html, other]
-
Title: A Unified Analysis for Finite Weight AveragingComments: 34 pagesSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Averaging iterations of Stochastic Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Stochastic Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA). Especially, with a finite weight averaging method, LAWA can attain faster convergence and better generalization. However, its theoretical explanation is still less explored since there are fundamental differences between finite and infinite settings. In this work, we first generalize SGD and LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization. A key challenge is the inapplicability of traditional methods in the sense of expectation or optimal values for infinite-dimensional settings in analyzing FWA's convergence. Second, the cumulative gradients introduced by FWA introduce additional confusion to the generalization analysis, especially making it more difficult to discuss them under different assumptions. Extending the final iteration convergence analysis to the FWA, this paper, under a convexity assumption, establishes a convergence bound $\mathcal{O}(\log\left(\frac{T}{k}\right)/\sqrt{T})$, where $k\in[1, T/2]$ is a constant representing the last $k$ iterations. Compared to SGD with $\mathcal{O}(\log(T)/\sqrt{T})$, we prove theoretically that FWA has a faster convergence rate and explain the effect of the number of average points. In the generalization analysis, we find a recursive representation for bounding the cumulative gradient using mathematical induction. We provide bounds for constant and decay learning rates and the convex and non-convex cases to show the good generalization performance of FWA. Finally, experimental results on several benchmarks verify our theoretical results.
- [14] arXiv:2411.13293 (cross-list from econ.TH) [pdf, other]
-
Title: Revealed InformationSubjects: Theoretical Economics (econ.TH); Econometrics (econ.EM); Optimization and Control (math.OC)
An analyst observes the frequency with which a decision maker (DM) takes actions, but does not observe the frequency of actions conditional on the payoff-relevant state. We ask when can the analyst rationalize the DM's choices as if the DM first learns something about the state before taking action. We provide a support function characterization of the triples of utility functions, prior beliefs, and (marginal) distributions over actions such that the DM's action distribution is consistent with information given the agent's prior and utility function. Assumptions on the cardinality of the state space and the utility function allow us to refine this characterization, obtaining a sharp system of finitely many inequalities the utility function, prior, and action distribution must satisfy. We apply our characterization to study comparative statics and ring-network games, and to identify conditions under which a data set is consistent with a public information structure in first-order Bayesian persuasion games. We characterize the set of distributions over posterior beliefs that are consistent with the DM's choices. Assuming the first-order approach applies, we extend our results to settings with a continuum of actions and/or states.%
- [15] arXiv:2411.13404 (cross-list from eess.SY) [pdf, other]
-
Title: Issues with Input-Space Representation in Nonlinear Data-Based Dissipativity EstimationComments: Preprint of conference manuscript, currently under reviewSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
In data-based control, dissipativity can be a powerful tool for attaining stability guarantees for nonlinear systems if that dissipativity can be inferred from data. This work provides a tutorial on several existing methods for data-based dissipativity estimation of nonlinear systems. The interplay between the underlying assumptions of these methods and their sample complexity is investigated. It is shown that methods based on delta-covering result in an intractable trade-off between sample complexity and robustness. A new method is proposed to quantify the robustness of machine learning-based dissipativity estimation. It is shown that this method achieves a more tractable trade-off between robustness and sample complexity. Several numerical case studies demonstrate the results.
- [16] arXiv:2411.13443 (cross-list from math.NA) [pdf, other]
-
Title: Nonlinear Assimilation with Score-based Sequential Langevin SamplingSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)
This paper presents a novel approach for nonlinear assimilation called score-based sequential Langevin sampling (SSLS) within a recursive Bayesian framework. SSLS decomposes the assimilation process into a sequence of prediction and update steps, utilizing dynamic models for prediction and observation data for updating via score-based Langevin Monte Carlo. An annealing strategy is incorporated to enhance convergence and facilitate multi-modal sampling. The convergence of SSLS in TV-distance is analyzed under certain conditions, providing insights into error behavior related to hyper-parameters. Numerical examples demonstrate its outstanding performance in high-dimensional and nonlinear scenarios, as well as in situations with sparse or partial measurements. Furthermore, SSLS effectively quantifies the uncertainty associated with the estimated states, highlighting its potential for error calibration.
Cross submissions (showing 8 of 8 entries)
- [17] arXiv:2110.11383 (replaced) [pdf, html, other]
-
Title: Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision ProcessesSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities. To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal and dual functions are estimated using samples from a single trajectory generated by the underlying time-varying Markov processes. This online primal-dual natural actor-critic algorithm maintains and iteratively updates three variables: a dual variable (or Lagrangian multiplier), a primal variable (or actor), and a critic variable used to estimate the gradients of both primal and dual variables. These variables are updated simultaneously but on different time scales (using different step sizes) and they are all intertwined with each other. Our main contribution is to derive a finite-time analysis for the convergence of this algorithm to the global optimum of a CMDP problem. Specifically, we show that with a proper choice of step sizes the optimality gap and constraint violation converge to zero in expectation at a rate $\mathcal{O}(1/K^{1/6})$, where K is the number of iterations. To our knowledge, this paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem. We also validate the effectiveness of this algorithm through numerical simulations.
- [18] arXiv:2307.00780 (replaced) [pdf, html, other]
-
Title: Variational Theory and Algorithms for a Class of Asymptotically Approachable Nonconvex ProblemsComments: Added termination criteria and numerical experiments; Streamlined proofsSubjects: Optimization and Control (math.OC)
We investigate a class of composite nonconvex functions, where the outer function is the sum of univariate extended-real-valued convex functions and the inner function is the limit of difference-of-convex functions. A notable feature of this class is that the inner function may fail to be locally Lipschitz continuous. It covers a range of important yet challenging applications, including inverse optimal value optimization and problems under value-at-risk constraints. We propose an asymptotic decomposition of the composite function that guarantees epi-convergence to the original function, leading to necessary optimality conditions for the corresponding minimization problem. The proposed decomposition also enables us to design a numerical algorithm such that any accumulation point of the generated sequence, if exists, satisfies the newly introduced optimality conditions. These results expand on the study of so-called amenable functions introduced by Poliquin and Rockafellar in 1992, which are compositions of convex functions with smooth maps, and the prox-linear methods for their minimization. To demonstrate that our algorithmic framework is practically implementable, we further present verifiable termination criteria and preliminary numerical results.
- [19] arXiv:2404.02871 (replaced) [pdf, html, other]
-
Title: Existence and uniqueness results for a mean-field game of optimal investmentSubjects: Optimization and Control (math.OC); Theoretical Economics (econ.TH)
We establish the existence and uniqueness of the equilibrium for a stochastic mean-field game of optimal investment. The analysis covers both finite and infinite time horizons, and the mean-field interaction of the representative company with a mass of identical and indistinguishable firms is modeled through the time-dependent price at which the produced good is sold. At equilibrium, this price is given in terms of a nonlinear function of the expected (optimally controlled) production capacity of the representative company at each time. The proof of the existence and uniqueness of the mean-field equilibrium relies on a priori estimates and the study of nonlinear integral equations, but employs different techniques for the finite and infinite horizon cases. Additionally, we investigate the deterministic counterpart of the mean-field game under study.
- [20] arXiv:2405.15631 (replaced) [pdf, html, other]
-
Title: An analysis of Wardrop equilibrium and social optimum in congested transit networksSubjects: Optimization and Control (math.OC)
The effective design and management of public transport systems are essential to ensure the best service for users. The performance of a transport system will depend heavily on user behaviour. In the common-lines problem approach, users choose which lines to use based on the best strategy for them. While Wardrop equilibrium has been studied for the common-lines problem, no contributions have been made towards achieving the social optimum. In this work, we propose two optimisation problems to obtain this optimum, using strategy flow and line flow formulations. We prove that both optimisation problems are equivalent, and we obtain a characterisation of the social optimum flows. The social optimum makes it possible to compute the price of anarchy (PoA), which quantifies the system's efficiency. The study of the PoA enables the effective design and management of public transport systems, guaranteeing the best service to users.
- [21] arXiv:2405.15894 (replaced) [pdf, html, other]
-
Title: Derivatives of Stochastic Gradient Descent in parametric optimizationJournal-ref: NeurIPS 2024Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
We consider stochastic optimization problems where the objective depends on some parameter, as commonly found in hyperparameter optimization for instance. We investigate the behavior of the derivatives of the iterates of Stochastic Gradient Descent (SGD) with respect to that parameter and show that they are driven by an inexact SGD recursion on a different objective function, perturbed by the convergence of the original SGD. This enables us to establish that the derivatives of SGD converge to the derivative of the solution mapping in terms of mean squared error whenever the objective is strongly convex. Specifically, we demonstrate that with constant step-sizes, these derivatives stabilize within a noise ball centered at the solution derivative, and that with vanishing step-sizes they exhibit $O(\log(k)^2 / k)$ convergence rates. Additionally, we prove exponential convergence in the interpolation regime. Our theoretical findings are illustrated by numerical experiments on synthetic tasks.
- [22] arXiv:2405.18926 (replaced) [pdf, html, other]
-
Title: Newton Method Revisited: Global Convergence Rates up to $\mathcal {O}\left(k^{-3} \right)$ for Stepsize Schedules and Linesearch ProceduresComments: 11 pagesSubjects: Optimization and Control (math.OC)
This paper investigates the global convergence of stepsized Newton methods for convex functions with Hölder continuous Hessians or third derivatives. We propose several simple stepsize schedules with fast global convergence guarantees, up to $\mathcal {O}\left(k^{-3} \right)$. For cases with multiple plausible smoothness parameterizations or an unknown smoothness constant, we introduce a stepsize linesearch and a backtracking procedure with provable convergence as if the optimal smoothness parameters were known in advance. Additionally, we present strong convergence guarantees for the practically popular Newton method with exact linesearch.
- [23] arXiv:2409.10301 (replaced) [pdf, html, other]
-
Title: Decomposition Pipeline for Large-Scale Portfolio Optimization with Applications to Near-Term Quantum ComputingAtithi Acharya, Romina Yalovetzky, Pierre Minssen, Shouvanik Chakrabarti, Ruslan Shaydulin, Rudy Raymond, Yue Sun, Dylan Herman, Ruben S. Andrist, Grant Salton, Martin J. A. Schuetz, Helmut G. Katzgraber, Marco PistoiaSubjects: Optimization and Control (math.OC); Data Analysis, Statistics and Probability (physics.data-an); Portfolio Management (q-fin.PM); Risk Management (q-fin.RM); Quantum Physics (quant-ph)
Industrially relevant constrained optimization problems, such as portfolio optimization and portfolio rebalancing, are often intractable or difficult to solve exactly. In this work, we propose and benchmark a decomposition pipeline targeting portfolio optimization and rebalancing problems with constraints. The pipeline decomposes the optimization problem into constrained subproblems, which are then solved separately and aggregated to give a final result. Our pipeline includes three main components: preprocessing of correlation matrices based on random matrix theory, modified spectral clustering based on Newman's algorithm, and risk rebalancing. Our empirical results show that our pipeline consistently decomposes real-world portfolio optimization problems into subproblems with a size reduction of approximately 80%. Since subproblems are then solved independently, our pipeline drastically reduces the total computation time for state-of-the-art solvers. Moreover, by decomposing large problems into several smaller subproblems, the pipeline enables the use of near-term quantum devices as solvers, providing a path toward practical utility of quantum computers in portfolio optimization.
- [24] arXiv:2411.10592 (replaced) [pdf, html, other]
-
Title: A Systematic LMI Approach to Design Multivariable Sliding Mode ControllersComments: 8 pages, 4 figuresSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper deals with sliding mode control for multivariable polytopic uncertain systems. We provide systematic procedures to design variable structure controllers (VSCs) and unit-vector controllers (UVCs). Based on suitable representations for the closed-loop system, we derive sufficient conditions in the form of linear matrix inequalities (LMIs) to design the robust sliding mode controllers such that the origin of the closed-loop system is globally stable in finite time. Moreover, by noticing that the reaching time depends on the initial condition and the decay rate, we provide convex optimization problems to design robust controllers by considering the minimization of the reaching time associated with a given set of initial conditions. Two examples illustrate the effectiveness of the proposed approaches.
- [25] arXiv:2105.03681 (replaced) [pdf, html, other]
-
Title: Universal Online Convex Optimization Meets Second-order BoundsSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Recently, several universal methods have been proposed for online convex optimization, and attain minimax rates for multiple types of convex functions simultaneously. However, they need to design and optimize one surrogate loss for each type of functions, making it difficult to exploit the structure of the problem and utilize existing algorithms. In this paper, we propose a simple strategy for universal online convex optimization, which avoids these limitations. The key idea is to construct a set of experts to process the original online functions, and deploy a meta-algorithm over the linearized losses to aggregate predictions from experts. Specifically, the meta-algorithm is required to yield a second-order bound with excess losses, so that it can leverage strong convexity and exponential concavity to control the meta-regret. In this way, our strategy inherits the theoretical guarantee of any expert designed for strongly convex functions and exponentially concave functions, up to a double logarithmic factor. As a result, we can plug in off-the-shelf online solvers as black-box experts to deliver problem-dependent regret bounds. For general convex functions, it maintains the minimax optimality and also achieves a small-loss bound. Furthermore, we extend our universal strategy to online composite optimization, where the loss function comprises a time-varying function and a fixed regularizer. To deal with the composite loss functions, we employ a meta-algorithm based on the optimistic online learning framework, which not only possesses a second-order bound, but also can utilize estimations for upcoming loss functions. With appropriate configurations, we demonstrate that the additional regularizer does not contribute to the meta-regret, thus maintaining the universality in the composite setting.
- [26] arXiv:2207.05942 (replaced) [pdf, other]
-
Title: Syndrome decoding by quantum approximate optimizationComments: with appendix, totally 16 figuresJournal-ref: Quantum Inf Process 23, 368 (2024)Subjects: Quantum Physics (quant-ph); Optimization and Control (math.OC)
The syndrome decoding problem is known to be NP-complete. The goal of the decoder is to find an error of low weight that corresponds to a given syndrome obtained from a parity-check matrix. We use the quantum approximate optimization algorithm (QAOA) to address the syndrome decoding problem with elegantly-designed reward Hamiltonians based on both generator and check matrices for classical and quantum codes. We evaluate the level-4 check-based QAOA decoding of the [7,4,3] Hamming code, as well as the level-4 generator-based QAOA decoding of the [[5,1,3]] quantum code. Remarkably, the simulation results demonstrate that the decoding performances match those of the maximum likelihood decoding. Moreover, we explore the possibility of enhancing QAOA by introducing additional redundant clauses to a combinatorial optimization problem while keeping the number of qubits unchanged. Finally, we study QAOA decoding of degenerate quantum codes. Typically, conventional decoders aim to find a unique error of minimum weight that matches a given syndrome. However, our observations reveal that QAOA has the intriguing ability to identify degenerate errors of comparable weight, providing multiple potential solutions that match the given syndrome with comparable probabilities. This is illustrated through simulations of the generator-based QAOA decoding of the [[9,1,3]] Shor code on specific error syndromes.
- [27] arXiv:2311.07633 (replaced) [pdf, html, other]
-
Title: Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization RegimeComments: NeurIPS 2024 Datasets and Benchmarks TrackSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Predictive combinatorial optimization, where the parameters of combinatorial optimization (CO) are unknown at the decision-making time, is the precise modeling of many real-world applications, including energy cost-aware scheduling and budget allocation on advertising. Tackling such a problem usually involves a prediction model and a CO solver. These two modules are integrated into the predictive CO pipeline following two design principles: "Predict-then-Optimize (PtO)", which learns predictions by supervised training and subsequently solves CO using predicted coefficients, while the other, named "Predict-and-Optimize (PnO)", directly optimizes towards the ultimate decision quality and claims to yield better decisions than traditional PtO approaches. However, there lacks a systematic benchmark of both approaches, including the specific design choices at the module level, as well as an evaluation dataset that covers representative real-world scenarios. To this end, we develop a modular framework to benchmark 11 existing PtO/PnO methods on 8 problems, including a new industrial dataset for combinatorial advertising that will be released. Our study shows that PnO approaches are better than PtO on 7 out of 8 benchmarks, but there is no silver bullet found for the specific design choices of PnO. A comprehensive categorization of current approaches and integration of typical scenarios are provided under a unified benchmark. Therefore, this paper could serve as a comprehensive benchmark for future PnO approach development and also offer fast prototyping for application-focused development. The code is available at this https URL.
- [28] arXiv:2402.10439 (replaced) [pdf, html, other]
-
Title: Competitive Equilibrium for Chores: from Dual Eisenberg-Gale to a Fast, Greedy, LP-based AlgorithmComments: 39 pages, 50 figuresSubjects: Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)
We study the computation of competitive equilibrium for Fisher markets with $n$ agents and $m$ divisible chores. Competitive equilibria for chores are known to correspond to the nonzero KKT points of a program that minimizes the product of agent disutilities, which is a non-convex program whose zero points foil iterative optimization methods. We introduce a dual-like analogue of this program, and show that a simple modification to our "dual" program avoids such zero points, while retaining the correspondence between KKT points and competitive equilibria. This allows, for the first time ever, application of iterative optimization methods over a convex region for computing competitive equilibria for chores. We next introduce a greedy Frank-Wolfe algorithm for optimization over our program and show a new state-of-the-art convergence rate to competitive equilibrium. Moreover, our method is significantly simpler than prior methods: each iteration of our method only requires solving a simple linear program. We show through numerical experiments that our method is extremely practical: it easily solves every instance we tried, including instances with hundreds of agents and up to 1000 chores, usually in 10-30 iterations, is simple to implement, and has no numerical issues.
- [29] arXiv:2410.11061 (replaced) [pdf, html, other]
-
Title: Learning to Optimize for Mixed-Integer Non-linear ProgrammingSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Mixed-integer non-linear programs (MINLPs) arise in various domains, such as energy systems and transportation, but are notoriously difficult to solve. Recent advances in machine learning have led to remarkable successes in optimization tasks, an area broadly known as learning to optimize. This approach includes using predictive models to generate solutions for optimization problems with continuous decision variables, thereby avoiding the need for computationally expensive optimization algorithms. However, applying learning to MINLPs remains challenging primarily due to the presence of integer decision variables, which complicate gradient-based learning. To address this limitation, we propose two differentiable correction layers that generate integer outputs while preserving gradient information. Combined with a soft penalty for constraint violation, our framework can tackle both the integrality and non-linear constraints in a MINLP. Experiments on three problem classes with convex/non-convex objective/constraints and integer/mixed-integer variables show that the proposed learning-based approach consistently produces high-quality solutions for parametric MINLPs extremely quickly. As problem size increases, traditional exact solvers and heuristic methods struggle to find feasible solutions, whereas our approach continues to deliver reliable results. Our work extends the scope of learning-to-optimize to MINLP, paving the way for integrating integer constraints into deep learning models. Our code is available at this https URL.