{
  "type": "Article",
  "authors": [
    {
      "type": "Person",
      "familyNames": [
        "Beznosikov"
      ],
      "givenNames": [
        "Aleksandr"
      ]
    },
    {
      "type": "Person",
      "familyNames": [
        "Polyak"
      ],
      "givenNames": [
        "Boris"
      ]
    },
    {
      "type": "Person",
      "familyNames": [
        "Gorbunov"
      ],
      "givenNames": [
        "Eduard"
      ]
    },
    {
      "type": "Person",
      "familyNames": [
        "Kovalev"
      ],
      "givenNames": [
        "Dmitry"
      ]
    },
    {
      "type": "Person",
      "familyNames": [
        "Gasnikov"
      ],
      "givenNames": [
        "Alexander"
      ]
    }
  ],
  "description": "This paper is a survey of methods for solving smooth, (strongly) monotone stochastic variational inequalities.\nTo begin with, we present the deterministic foundation from which the stochastic methods eventually evolved.\nThen we review methods for the general stochastic formulation, and look at the finite-sum setup.\nThe last parts of the paper are devoted to various recent (not necessarily stochastic) advances in algorithms for variational inequalities.",
  "identifiers": [],
  "references": [
    {
      "type": "Article",
      "id": "bib-bib1",
      "authors": [],
      "title": "\nD. Adil, B. Bullins, A. Jambulapati and S. Sachdeva, Line search-free methods for higher-order smooth monotone variational inequalities, preprint, arXiv:2205.06167 (2022)\n",
      "url": "https://arxiv.org/abs/2205.06167"
    },
    {
      "type": "Article",
      "id": "bib-bib2",
      "authors": [],
      "title": "\nA. Alacaoglu and Y. Malitsky, Stochastic variance reduction for variational inequality methods, preprint, arXiv:2102.08352 (2021)\n",
      "url": "https://arxiv.org/abs/2102.08352"
    },
    {
      "type": "Article",
      "id": "bib-bib3",
      "authors": [],
      "title": "\nA. Alacaoglu, Y. Malitsky and V. Cevher, Forward-reflected-backward method with\nvariance reduction. Comput. Optim. Appl. 80, 321–346 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib4",
      "authors": [],
      "title": "\nM. S. Alkousa, A. V. Gasnikov, D. M. Dvinskikh, D. A. Kovalev and F. S. Stonyakin, Accelerated methods for saddle-point problem.\nComput. Math. Math. Phys. 60, 1787–1809 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib5",
      "authors": [],
      "title": "\nK. Antonakopoulos, E. V. Belmega and P. Mertikopoulos, Adaptive extra-gradient methods for min-max optimization and games, preprint, arXiv:2010.12100 (2020)\n",
      "url": "https://arxiv.org/abs/2010.12100"
    },
    {
      "type": "Article",
      "id": "bib-bib6",
      "authors": [],
      "title": "\nK. J. Arrow, L. Hurwicz and H. Uzawa, Studies in linear and non-linear programming.\nStanford Mathematical Studies in the Social Sciences, II, Stanford University Press, Stanford (1958)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib7",
      "authors": [],
      "title": "\nW. Azizian, D. Scieur, I. Mitliagkas, S. Lacoste-Julien and G. Gidel, Accelerating smooth games by manipulating spectral shapes.\nIn International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res., 1705–1715 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib8",
      "authors": [],
      "title": "\nF. Bach and K. Y. Levy, A universal algorithm for variational inequalities adaptive to smoothness and noise.\nIn Conference on Learning Theory, Proc. Mach. Learn. Res., 164–194 (2019)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib9",
      "authors": [],
      "title": "\nF. Bach, J. Mairal and J. Ponce, Convex sparse matrix factorizations, preprint, arXiv:0812.1869 (2008)\n",
      "url": "https://arxiv.org/abs/0812.1869"
    },
    {
      "type": "Article",
      "id": "bib-bib10",
      "authors": [],
      "title": "\nA. Bakushinskii and B. Polyak, On the solution of variational inequalities. Sov. Math. Dokl. 15, 1705–1710 (1974)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib11",
      "authors": [],
      "title": "\nA. Ben-Tal, L. El Ghaoui and A. Nemirovski, Robust optimization.\nPrinceton Ser. Appl. Math., Princeton University Press, Princeton (2009)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib12",
      "authors": [],
      "title": "\nA. Beznosikov, A. Alanov, D. Kovalev, M. Takáč and A. Gasnikov, On scaled methods for saddle point problems, preprint, arXiv:2206.08303 (2022)\n",
      "url": "https://arxiv.org/abs/2206.08303"
    },
    {
      "type": "Article",
      "id": "bib-bib13",
      "authors": [],
      "title": "\nA. Beznosikov, P. Dvurechensky, A. Koloskova, V. Samokhin, S. U. Stich and A. Gasnikov, Decentralized local stochastic extra-gradient for variational inequalities, preprint, arXiv:2106.08315 (2021)\n",
      "url": "https://arxiv.org/abs/2106.08315"
    },
    {
      "type": "Article",
      "id": "bib-bib14",
      "authors": [],
      "title": "\nA. Beznosikov, A. Gasnikov, K. Zainulina, A. Maslovskiy and D. Pasechnyuk, A unified analysis of variational inequality methods: Variance reduction, sampling, quantization and coordinate descent, preprint, arXiv:2201.12206 (2022)\n",
      "url": "https://arxiv.org/abs/2201.12206"
    },
    {
      "type": "Article",
      "id": "bib-bib15",
      "authors": [],
      "title": "\nA. Beznosikov, E. Gorbunov, H. Berard and N. Loizou, Stochastic gradient descent-ascent: Unified theory and new efficient methods, preprint, arXiv:2202.07262 (2022)\n",
      "url": "https://arxiv.org/abs/2202.07262"
    },
    {
      "type": "Article",
      "id": "bib-bib16",
      "authors": [],
      "title": "\nA. Beznosikov, A. Rogozin, D. Kovalev and A. Gasnikov, Near-optimal decentralized algorithms for saddle point problems over time-varying networks.\nIn Optimization and applications, Lecture Notes in Comput. Sci. 13078, Springer, Cham, 246–257 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib17",
      "authors": [],
      "title": "\nA. Beznosikov, V. Samokhin and A. Gasnikov, Distributed saddle-point problems: Lower bounds, optimal and robust algorithms, preprint arXiv:2010.13112 (2020)\n",
      "url": "https://arxiv.org/abs/2010.13112"
    },
    {
      "type": "Article",
      "id": "bib-bib18",
      "authors": [],
      "title": "\nA. Beznosikov, G. Scutari, A. Rogozin and A. Gasnikov, Distributed saddle-point problems under data similarity.\nAdv. Neural Inf. Process. Syst. 34, 8172–8184 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib19",
      "authors": [],
      "title": "\nF. E. Browder, Existence and approximation of solutions of nonlinear variational inequalities.\nProc. Nat. Acad. Sci. U.S.A. 56, 1080–1086 (1966)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib20",
      "authors": [],
      "title": "\nX. Cai, C. Song, C. Guzmán and J. Diakonikolas, A stochastic halpern iteration with variance reduction for stochastic monotone inclusion problems, preprint, arXiv:2203.09436 (2022)\n",
      "url": "https://arxiv.org/abs/2203.09436"
    },
    {
      "type": "Article",
      "id": "bib-bib21",
      "authors": [],
      "title": "\nY. Carmon, Y. Jin, A. Sidford and K. Tian, Coordinate methods for matrix games.\nIn 2020 IEEE 61st Annual Symposium on Foundations of Computer Science, IEEE Computer Soc., Los Alamitos, 283–293 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib22",
      "authors": [],
      "title": "\nA. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging.\nJ. Math. Imaging Vision 40, 120–145 (2011)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib23",
      "authors": [],
      "title": "\nY. Carmon, D. Hausler, A. Jambulapati, Y. Jin and A. Sidford, Optimal and adaptive Monteiro–Svaiter acceleration, preprint, arXiv:2205.15371 (2022)\n",
      "url": "https://arxiv.org/abs/2205.15371"
    },
    {
      "type": "Article",
      "id": "bib-bib24",
      "authors": [],
      "title": "\nY. Carmon, A. Jambulapati, Y. Jin and A. Sidford, Recapp: Crafting a more efficient catalyst for convex optimization.\nIn International Conference on Machine Learning, Proc. Mach. Learn. Res., 2658–2685 (2022)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib25",
      "authors": [],
      "title": "\nY. Carmon, Y. Jin, A. Sidford and K. Tian, Variance reduction for matrix games, preprint, arXiv:1907.02056 (2019)\n",
      "url": "https://arxiv.org/abs/1907.02056"
    },
    {
      "type": "Article",
      "id": "bib-bib26",
      "authors": [],
      "title": "\nT. Chavdarova, G. Gidel, F. Fleuret and S. Lacoste-Julien, Reducing noise in GAN training with variance reduced extragradient.\nAdv. Neural Inf. Process. Syst. 32, 393–403 (2019)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib27",
      "authors": [],
      "title": "\nL. Chen and L. Luo, Near-optimal algorithms for making the gradient small in stochastic minimax optimization, preprint, arXiv:2208.05925 (2022)\n",
      "url": "https://arxiv.org/abs/2208.05925"
    },
    {
      "type": "Article",
      "id": "bib-bib28",
      "authors": [],
      "title": "\nM. B. Cohen, A. Sidford and K. Tian, Relative Lipschitzness in extragradient methods and a direct recipe for acceleration.\nIn 12th Innovations in Theoretical Computer Science Conference, LIPIcs. Leibniz Int. Proc. Inform. 185, Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, Article No. 62, (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib29",
      "authors": [],
      "title": "\nA. Defazio, F. Bach and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives.\nAdv. Neural Inf. Process. Syst. 27, 1646–1654 (2014)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib30",
      "authors": [],
      "title": "\nJ. Diakonikolas and P. Wang, Potential function-based framework for minimizing gradients in convex and min-max optimization.\nSIAM J. Optim. 32, 1668–1697 (2022)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib31",
      "authors": [],
      "title": "\nZ. Dou and Y. Li, On the one-sided convergence of Adam-type algorithms in non-convex non-concave min-max optimization, preprint, arXiv:2109.14213 (2021)\n",
      "url": "https://arxiv.org/abs/2109.14213"
    },
    {
      "type": "Article",
      "id": "bib-bib32",
      "authors": [],
      "title": "\nS. S. Du, G. Gidel, M. I. Jordan and C. J. Li, Optimal extragradient-based bilinearly-coupled saddle-point optimization, preprint, arXiv:2206.08573 (2022)\n\n",
      "url": "https://arxiv.org/abs/2206.08573"
    },
    {
      "type": "Article",
      "id": "bib-bib33",
      "authors": [],
      "title": "\nJ. Duchi, E. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization.\nJ. Mach. Learn. Res. 12, 2121–2159 (2011)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib34",
      "authors": [],
      "title": "\nA. Ene and H. L. Nguyen, Adaptive and universal algorithms for variational inequalities with optimal convergence.\nIn Proceedings of the AAAI Conference on Artificial Intelligence, 36, 6559–6567 (2022)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib35",
      "authors": [],
      "title": "\nA. Ene, H. L. Nguyen and A. Vladu, Adaptive gradient methods for constrained convex optimization and variational inequalities.\nIn Proceedings of the AAAI Conference on Artificial Intelligence, 35, 7314–7321 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib36",
      "authors": [],
      "title": "\nE. Esser, X. Zhang and T. F. Chan, A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science.\nSIAM J. Imaging Sci. 3, 1015–1046 (2010)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib37",
      "authors": [],
      "title": "\nF. Facchinei and J.-S. Pang, Finite-dimensional variational inequalities and complementarity problems.\nSpringer Series in Operations Research, Springer, New York (2003)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib38",
      "authors": [],
      "title": "\nD. J. Foster, A. Sekhari, O. Shamir, N. Srebro, K. Sridharan and B. Woodworth,\nThe complexity of making the gradient small in stochastic convex optimization.\nIn Conference on Learning Theory, Proc. Mach. Learn. Res., 1319–1345 (2019)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib39",
      "authors": [],
      "title": "\nA. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, C. A. Uribe, B. Jiang, H. Wang, S. Zhang, S. Bubeck et al., Near optimal methods for minimizing convex functions with Lipschitz p-th derivatives.\nIn Conference on Learning Theory, Proc. Mach. Learn. Res., 1392–1393 (2019)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib40",
      "authors": [],
      "title": "\nA. V. Gasnikov, P. E. Dvurechensky, F. S. Stonyakin and A. A. Titov, An adaptive proximal method for variational inequalities.\nComput. Math. Math. Phys. 59, 836–841 (2019)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib41",
      "authors": [],
      "title": "\nG. Gidel, H. Berard, G. Vignoud, P. Vincent and S. Lacoste-Julien, A variational inequality perspective on generative adversarial networks, preprint, arXiv:1802.10551 (2018)\n",
      "url": "https://arxiv.org/abs/1802.10551"
    },
    {
      "type": "Article",
      "id": "bib-bib42",
      "authors": [],
      "title": "\nG. Gidel, R. A. Hemmat, M. Pezeshki, R. Le Priol, G. Huang, S. Lacoste-Julien and I. Mitliagkas, Negative momentum for improved game dynamics.\nIn The 22nd International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res., 1802–1811 (2019)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib43",
      "authors": [],
      "title": "\nE. Gladin, I. Kuruzov, F. Stonyakin, D. Pasechnyuk, M. Alkousa and A. Gasnikov, Solving strongly convex-concave composite saddle point problems with a small dimension of one of the variables, preprint, arXiv:2010.02280 (2022)\n",
      "url": "https://arxiv.org/abs/2010.02280"
    },
    {
      "type": "Article",
      "id": "bib-bib44",
      "authors": [],
      "title": "\nE. Gladin, A. Sadiev, A. Gasnikov, P. Dvurechensky, A. Beznosikov and M. Alkousa, Solving smooth min-min and min-max problems by mixed oracle algorithms.\nIn Mathematical Optimization Theory and Operations Research—Recent Trends, Commun. Comput. Inf. Sci. 1476, Springer, Cham, 19–40 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib45",
      "authors": [],
      "title": "\nE. G. Gol’šteĭn, Convergence of the gradient method for finding the saddle points of modified Lagrangian functions.\nÈkonom. i Mat. Metody 13, 322–329 (1977)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib46",
      "authors": [],
      "title": "\nI. Goodfellow, Y. Bengio and A. Courville, Deep learning.\nAdaptive Computation and Machine Learning, MIT Press, Cambridge (2016)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib47",
      "authors": [],
      "title": "\nI. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial networks.\nCommun. ACM 63, 139–144 (2020)\n\n"
    },
    {
      "type": "Article",
      "id": "bib-bib48",
      "authors": [],
      "title": "\nE. Gorbunov, H. Berard, G. Gidel and N. Loizou, Stochastic extragradient: General analysis and improved rates.\nIn International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res., 7865–7901 (2022)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib49",
      "authors": [],
      "title": "\nE. Gorbunov, M. Danilova, D. Dobre, P. Dvurechensky, A. Gasnikov and G. Gidel, Clipped stochastic methods for variational inequalities with heavy-tailed noise, preprint, arXiv:2206.01095 (2022)\n",
      "url": "https://arxiv.org/abs/2206.01095"
    },
    {
      "type": "Article",
      "id": "bib-bib50",
      "authors": [],
      "title": "\nE. Gorbunov, M. Danilova and A. Gasnikov, Stochastic optimization with heavy-tailed noise via accelerated gradient clipping.\nAdv. Neural Inf. Process. Syst. 33, 15042–15053 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib51",
      "authors": [],
      "title": "\nE. Gorbunov, F. Hanzely and P. Richtárik, A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent.\nIn International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res., 680–690 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib52",
      "authors": [],
      "title": "\nR. M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin and\nP. Richtárik, SGD: General analysis and improved rates.\nIn Proceedings of the 36th International Conference on Machine Learning, Proc. Mach. Learn. Res. 97, 5200–5209 (2019)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib53",
      "authors": [],
      "title": "\nY. Han, G. Xie and Z. Zhang, Lower complexity bounds of finite-sum optimization problems: The results and construction, preprint, arXiv:2103.08280 (2021)\n",
      "url": "https://arxiv.org/abs/2103.08280"
    },
    {
      "type": "Article",
      "id": "bib-bib54",
      "authors": [],
      "title": "\nY.-G. Hsieh, F. Iutzeler, J. Malick and P. Mertikopoulos, On the convergence of single-call stochastic extra-gradient methods.\nAdv. Neural Inf. Process. Syst. 32, 6938–6948 (2019)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib55",
      "authors": [],
      "title": "\nY.-G. Hsieh, F. Iutzeler, J. Malick and P. Mertikopoulos, Explore aggressively, update conservatively: Stochastic extragradient methods with variable stepsize scaling.\nAdv. Neural Inf. Process. Syst. 33, 16223–16234 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib56",
      "authors": [],
      "title": "\nA. Ibrahim, W. Azizian, G. Gidel and I. Mitliagkas, Linear lower bounds and conditioning of differentiable games.\nIn International Conference on Machine Learning, Proc. Mach. Learn. Res., 4583–4593 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib57",
      "authors": [],
      "title": "\nY. Jin and A. Sidford, Efficiently solving MDPs with stochastic mirror descent.\nIn Proceedings of the 37th International Conference on Machine Learning (ICML), Proc. Mach. Learn. Res. 119, 4890–4900 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib58",
      "authors": [],
      "title": "\nY. Jin, A. Sidford and K. Tian, Sharper rates for separable minimax and finite sum optimization via primal-dual extragradient methods, preprint, arXiv:2202.04640 (2022)\n",
      "url": "https://arxiv.org/abs/2202.04640"
    },
    {
      "type": "Article",
      "id": "bib-bib59",
      "authors": [],
      "title": "\nR. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction.\nAdv. Neural Inf. Process. Syst. 26, 315–323 (2013)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib60",
      "authors": [],
      "title": "\nA. Juditsky, A. Nemirovski and C. Tauvel, Solving variational inequalities with stochastic mirror-prox algorithm.\nStoch. Syst. 1, 17–58 (2011)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib61",
      "authors": [],
      "title": "\nE. N. Khobotov, Modification of the extra-gradient method for solving variational inequalities and certain optimization problems.\nU.S.S.R. Comput. Math. Math. Phys. 27, 120–127 (1987)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib62",
      "authors": [],
      "title": "\nD. Kim and J. A. Fessler, Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions.\nJ. Optim. Theory Appl. 188, 192–219 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib63",
      "authors": [],
      "title": "\nD. P. Kingma and J. Ba, Adam: A method for stochastic optimization.\nIn International Conference on Learning Representations, 2305–2313 (2015)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib64",
      "authors": [],
      "title": "\nG. M. Korpelevič, An extragradient method for finding saddle points and for other problems.\nÈkonom. i Mat. Metody 12, 747–756 (1976)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib65",
      "authors": [],
      "title": "\nG. M. Korpelevich, Extrapolation gradient methods and their relation to modified Lagrange functions.\nÈkonom. i Mat. Metody 19, 694–703 (1983)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib66",
      "authors": [],
      "title": "\nG. Kotsalis, G. Lan and T. Li, Simple and optimal methods for stochastic variational inequalities. I: Operator extrapolation.\nSIAM J. Optim. 32, 2041–2073 (2022)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib67",
      "authors": [],
      "title": "\nG. Kotsalis, G. Lan and T. Li, Simple and optimal methods for stochastic variational inequalities. II: Markovian noise and policy evaluation in reinforcement learning.\nSIAM J. Optim. 32, 1120–1155 (2022)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib68",
      "authors": [],
      "title": "\nD. Kovalev, A. Beznosikov, E. Borodich, A. Gasnikov and G. Scutari, Optimal gradient sliding and its application to distributed optimization under similarity, preprint, arXiv:2205.15136 (2022)\n",
      "url": "https://arxiv.org/abs/2205.15136"
    },
    {
      "type": "Article",
      "id": "bib-bib69",
      "authors": [],
      "title": "\nD. Kovalev, A. Beznosikov, A. Sadiev, M. Persiianov, P. Richtárik and A. Gasnikov, Optimal algorithms for decentralized stochastic variational inequalities, preprint, arXiv:2202.02771 (2022)\n",
      "url": "https://arxiv.org/abs/2202.02771"
    },
    {
      "type": "Article",
      "id": "bib-bib70",
      "authors": [],
      "title": "\nD. Kovalev and A. Gasnikov, The first optimal acceleration of high-order methods in smooth convex optimization, preprint, arXiv:2205.09647 (2022)\n",
      "url": "https://arxiv.org/abs/2205.09647"
    },
    {
      "type": "Article",
      "id": "bib-bib71",
      "authors": [],
      "title": "\nD. Kovalev and A. Gasnikov, The first optimal algorithm for smooth and strongly-convex-strongly-concave minimax optimization, preprint, arXiv:2205.05653 (2022)\n",
      "url": "https://arxiv.org/abs/2205.05653"
    },
    {
      "type": "Article",
      "id": "bib-bib72",
      "authors": [],
      "title": "\nD. Kovalev, A. Gasnikov and P. Richtárik, Accelerated primal-dual gradient method for smooth and convex-concave saddle-point problems with bilinear coupling, preprint, arXiv:2112.15199 (2021)\n",
      "url": "https://arxiv.org/abs/2112.15199"
    },
    {
      "type": "Article",
      "id": "bib-bib73",
      "authors": [],
      "title": "\nD. Kovalev, S. Horváth and P. Richtárik, Don’t jump through hoops and remove those loops: SVRG and Katyusha are better without the outer loop.\nIn Proceedings of the 31st International Conference on Algorithmic Learning Theory, edited by A. Kontorovich and G. Neu, Proc. Mach. Learn. Res. 117, 451–467 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib74",
      "authors": [],
      "title": "\nS. Lee and D. Kim, Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems.\nAdv. Neural Inf. Process. Syst. 34, 22588–22600 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib75",
      "authors": [],
      "title": "\nD. Lin, H. Ye and Z. Zhang, Explicit superlinear convergence rates of Broyden’s methods in nonlinear equations, preprint, arXiv:2109.01974 (2021)\n",
      "url": "https://arxiv.org/abs/2109.01974"
    },
    {
      "type": "Article",
      "id": "bib-bib76",
      "authors": [],
      "title": "\nH. Lin, J. Mairal and Z. Harchaoui, A universal catalyst for first-order optimization.\nAdv. Neural Inf. Process. Syst. 28, 3384–3392 (2015)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib77",
      "authors": [],
      "title": "\nT. Lin, C. Jin and M. I. Jordan, Near-optimal algorithms for minimax optimization.\nIn Conference on Learning Theory, Proc. Mach. Learn. Res., 2738–2779 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib78",
      "authors": [],
      "title": "\nT. Lin, M. Jordan et al., Perseus: A simple high-order regularization method for variational inequalities, preprint, arXiv:2205.03202 (2022)\n",
      "url": "https://arxiv.org/abs/2205.03202"
    },
    {
      "type": "Article",
      "id": "bib-bib79",
      "authors": [],
      "title": "\nC. Liu and L. Luo, Quasi-Newton methods for saddle point problems, preprint, arXiv:2111.02708 (2021)\n",
      "url": "https://arxiv.org/abs/2111.02708"
    },
    {
      "type": "Article",
      "id": "bib-bib80",
      "authors": [],
      "title": "\nM. Liu, Y. Mroueh, J. Ross, W. Zhang, X. Cui, P. Das and T. Yang, Towards better understanding of adaptive gradient algorithms in generative adversarial nets, preprint, arXiv:1912.11940 (2019)\n",
      "url": "https://arxiv.org/abs/1912.11940"
    },
    {
      "type": "Article",
      "id": "bib-bib81",
      "authors": [],
      "title": "\nN. Loizou, H. Berard, G. Gidel, I. Mitliagkas and S. Lacoste-Julien, Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity.\nAdv. Neural Inf. Process. Syst. 34, 19095–19108 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib82",
      "authors": [],
      "title": "\nL. Luo, G. Xie, T. Zhang and Z. Zhang, Near optimal stochastic algorithms for finite-sum unbalanced convex-concave minimax optimization, preprint, arXiv:2106.01761 (2021)\n",
      "url": "https://arxiv.org/abs/2106.01761"
    },
    {
      "type": "Article",
      "id": "bib-bib83",
      "authors": [],
      "title": "\nY. Malitsky and M. K. Tam, A forward-backward splitting method for monotone inclusions without cocoercivity.\nSIAM J. Optim. 30, 1451–1472 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib84",
      "authors": [],
      "title": "\nB. Martinet, Régularisation d’inéquations variationnelles par approximations successives.\nRev. Française Informat. Recherche Opérationnelle 4, 154–158 (1970)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib85",
      "authors": [],
      "title": "\nD. Metelev, A. Rogozin, A. Gasnikov and D. Kovalev, Decentralized saddle-point problems with different constants of strong convexity and strong concavity, preprint, arXiv:2206.00090 (2022)\n",
      "url": "https://arxiv.org/abs/2206.00090"
    },
    {
      "type": "Article",
      "id": "bib-bib86",
      "authors": [],
      "title": "\nK. Mishchenko, D. Kovalev, E. Shulgin, P. Richtárik and Y. Malitsky, Revisiting stochastic extragradient.\nIn International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res., 4573–4582 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib87",
      "authors": [],
      "title": "\nA. Mokhtari, A. Ozdaglar and S. Pattathil, A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach.\nIn International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res., 1497–1507 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib88",
      "authors": [],
      "title": "\nR. D. C. Monteiro and B. F. Svaiter, Iteration-complexity of a Newton proximal extragradient method for monotone variational inequalities and inclusion problems.\nSIAM J. Optim. 22, 914–935 (2012)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib89",
      "authors": [],
      "title": "\nR. D. C. Monteiro and B. F. Svaiter, An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods.\nSIAM J. Optim. 23, 1092–1125 (2013)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib90",
      "authors": [],
      "title": "\nS. Mukherjee and M. Chakraborty, A decentralized algorithm for large scale min-max problems.\nIn 2020 59th IEEE Conference on Decision and Control (CDC), 2967–2972 (2020)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib91",
      "authors": [],
      "title": "\nA. Nemirovski, Efficient methods in convex programming. Lecture notes, https://www2.isye.gatech.edu/~nemirovs/Lect_EMCO.pdf (1994)\n",
      "url": "https://www2.isye.gatech.edu/~nemirovs/Lect_EMCO.pdf"
    },
    {
      "type": "Article",
      "id": "bib-bib92",
      "authors": [],
      "title": "\nA. Nemirovski, Prox-method with rate of convergence O⁢(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems.\nSIAM J. Optim. 15, 229–251 (2004)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib93",
      "authors": [],
      "title": "\nY. Nesterov, Smooth minimization of non-smooth functions.\nMath. Program. 103, 127–152 (2005)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib94",
      "authors": [],
      "title": "\nY. Nesterov, Cubic regularization of Newton’s method for convex problems with constraints.\nCORE Discussion Paper No. 2006/39, https://ssrn.com/abstract=921825 (2006)\n",
      "url": "https://ssrn.com/abstract=921825"
    },
    {
      "type": "Article",
      "id": "bib-bib95",
      "authors": [],
      "title": "\nY. Nesterov, Dual extrapolation and its applications to solving variational inequalities and related problems.\nMath. Program. 109, 319–344 (2007)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib96",
      "authors": [],
      "title": "\nY. Nesterov, How to make the gradients small.\nOptima. Mathematical Optimization Society Newsletter 88, 10–11 (2012)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib97",
      "authors": [],
      "title": "\nY. Nesterov, Implementable tensor methods in unconstrained convex optimization.\nMath. Program. 186, 157–183 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib98",
      "authors": [],
      "title": "\nY. Nesterov, A. Gasnikov, S. Guminov and P. Dvurechensky, Primal-dual accelerated gradient methods with small-dimensional relaxation oracle.\nOptim. Methods Softw. 36, 773–810 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib99",
      "authors": [],
      "title": "\nY. Nesterov and B. T. Polyak, Cubic regularization of Newton method and its global performance.\nMath. Program. 108, 177–205 (2006)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib100",
      "authors": [],
      "title": "\nS. Omidshafiei, J. Pazis, C. Amato, J. P. How and J. Vian, Deep decentralized multi-task multi-agent reinforcement learning under partial observability.\nIn Proceedings of the 34th International Conference on Machine Learning (ICML), Proc. Mach. Learn. Res. 70, 2681–2690 (2017)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib101",
      "authors": [],
      "title": "\nY. Ouyang and Y. Xu, Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems.\nMath. Program. 185, 1–35 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib102",
      "authors": [],
      "title": "\nB. Palaniappan and F. Bach, Stochastic variance reduction methods for saddle-point problems.\nAdv. Neural Inf. Process. Syst., 1416–1424 (2016)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib103",
      "authors": [],
      "title": "\nR. Pascanu, T. Mikolov and Y. Bengio, On the difficulty of training recurrent neural networks.\nIn International Conference on Machine Learning, 1310–1318 (2013)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib104",
      "authors": [],
      "title": "\nB. T. Polyak, Introduction to optimization. Translations Series in Mathematics and Engineering, Optimization Software, Inc., Publications Division, New York (1987)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib105",
      "authors": [],
      "title": "\nL. D. Popov, A modification of the Arrow–Hurwicz method for search of saddle points.\nMath. Notes 28, 845–848 (1981)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib106",
      "authors": [],
      "title": "\nR. T. Rockafellar, Convex functions, monotone operators and variational inequalities.\nIn Theory and Applications of Monotone Operators (Proc. NATO Advanced Study Inst., Venice, 1968), Edizioni “Oderisi”, Gubbio, 35–65 (1969)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib107",
      "authors": [],
      "title": "\nR. T. Rockafellar, Monotone operators and the proximal point algorithm.\nSIAM J. Control Optim. 14, 877–898 (1976)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib108",
      "authors": [],
      "title": "\nA. Rogozin, A. Beznosikov, D. Dvinskikh, D. Kovalev, P. Dvurechensky and A. Gasnikov, Decentralized distributed optimization for saddle point problems, preprint, arXiv:2102.07758 (2021)\n",
      "url": "https://arxiv.org/abs/2102.07758"
    },
    {
      "type": "Article",
      "id": "bib-bib109",
      "authors": [],
      "title": "\nA. Sadiev, D. Kovalev and P. Richtárik, Communication acceleration of local gradient methods via an accelerated primal-dual algorithm with inexact prox. arXiv:2207.03957 (2022)\n",
      "url": "https://arxiv.org/abs/2207.03957"
    },
    {
      "type": "Article",
      "id": "bib-bib110",
      "authors": [],
      "title": "\nM. Sibony, Méthodes itératives pour les équations et inéquations aux dérivées partielles non linéaires de type monotone.\nCalcolo 7, 65–183 (1970)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib111",
      "authors": [],
      "title": "\nC. Song, C. Y. Lin, S. J. Wright and J. Diakonikolas, Coordinate linear variance reduction for generalized linear programming, preprint, arXiv:2111.01842 (2021)\n",
      "url": "https://arxiv.org/abs/2111.01842"
    },
    {
      "type": "Article",
      "id": "bib-bib112",
      "authors": [],
      "title": "\nC. Song, S. J. Wright and J. Diakonikolas, Variance reduction via primal-dual accelerated dual averaging for nonsmooth convex finite-sums.\nIn International Conference on Machine Learning, Proc. Mach. Learn. Res., 9824–9834 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib113",
      "authors": [],
      "title": "\nG. Stampacchia, Formes bilinéaires coercitives sur les ensembles convexes.\nC. R. Acad. Sci. Paris 258, 4413–4416 (1964)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib114",
      "authors": [],
      "title": "\nF. Stonyakin, A. Gasnikov, P. Dvurechensky, A. Titov and M. Alkousa, Generalized mirror prox algorithm for monotone variational inequalities: Universality and inexact oracle.\nJ. Optim. Theory Appl. 194, 988–1013 (2022)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib115",
      "authors": [],
      "title": "\nF. Stonyakin, A. Tyurin, A. Gasnikov, P. Dvurechensky, A. Agafonov, D. Dvinskikh, M. Alkousa, D. Pasechnyuk, S. Artamonov and V. Piskunova, Inexact model: A framework for optimization and variational inequalities.\nOptim. Methods Softw. 36, 1155–1201 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib116",
      "authors": [],
      "title": "\nK. K. Thekumparampil, N. He and S. Oh, Lifted primal-dual method for bilinearly coupled smooth minimax optimization, preprint, arXiv:2201.07427 (2022)\n",
      "url": "https://arxiv.org/abs/2201.07427"
    },
    {
      "type": "Article",
      "id": "bib-bib117",
      "authors": [],
      "title": "\nA. A. Titov, S. S. Ablaev, M. S. Alkousa, F. S. Stonyakin and A. V. Gasnikov, Some adaptive first-order methods for variational inequalities with relatively strongly monotone operators and generalized smoothness, preprint, arXiv:2207.09544 (2022)\n",
      "url": "https://arxiv.org/abs/2207.09544"
    },
    {
      "type": "Article",
      "id": "bib-bib118",
      "authors": [],
      "title": "\nV. Tominin, Y. Tominin, E. Borodich, D. Kovalev, A. Gasnikov and P. Dvurechensky, On accelerated methods for saddle-point problems with composite structure, preprint, arXiv:2103.09344 (2021)\n",
      "url": "https://arxiv.org/abs/2103.09344"
    },
    {
      "type": "Article",
      "id": "bib-bib119",
      "authors": [],
      "title": "\nP. Tseng, On linear convergence of iterative methods for the variational inequality problem.\nJ. Comput. Appl. Math. 60, 237–252 (1995)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib120",
      "authors": [],
      "title": "\nP. Tseng, A modified forward-backward splitting method for maximal monotone mappings.\nSIAM J. Control Optim. 38, 431–446 (2000)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib121",
      "authors": [],
      "title": "\nH. Ye, D. Lin and Z. Zhang, Greedy and random Broyden’s methods with explicit superlinear convergence rates in nonlinear equations, preprint, arXiv:2110.08572 (2021)\n",
      "url": "https://arxiv.org/abs/2110.08572"
    },
    {
      "type": "Article",
      "id": "bib-bib122",
      "authors": [],
      "title": "\nT. Yoon and E. K. Ryu, Accelerated algorithms for smooth convex-concave minimax problems with o⁢(1/k2) rate on squared gradient norm.\nIn International Conference on Machine Learning, Proc. Mach. Learn. Res., 12098–12109 (2021)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib123",
      "authors": [],
      "title": "\nG. Zhang, Y. Wang, L. Lessard and R. B. Grosse, Near-optimal local convergence of alternating gradient descent-ascent for minimax optimization.\nIn International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res., 7659–7679 (2022)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib124",
      "authors": [],
      "title": "\nJ. Zhang, M. Hong and S. Zhang, On lower iteration complexity bounds for the convex concave saddle point problems.\nMath. Program. 194, 901–935 (2022)\n"
    },
    {
      "type": "Article",
      "id": "bib-bib125",
      "authors": [],
      "title": "\nX. Zhang, N. S. Aybat and M. Gurbuzbalaban, Robust accelerated primal-dual methods for computing saddle points, preprint, arXiv:2111.12743 (2021)\n",
      "url": "https://arxiv.org/abs/2111.12743"
    }
  ],
  "title": "Smooth monotone stochastic variational inequalities and saddle point problems: A survey",
  "meta": {},
  "content": [
    {
      "type": "Heading",
      "id": "S1",
      "depth": 1,
      "content": [
        "1 Introduction"
      ]
    },
    {
      "type": "Paragraph",
      "id": "S1.p1",
      "content": [
        "In its long, more than half-century history of study (going back to the classical article [",
        {
          "type": "Cite",
          "target": "bib-bib113",
          "content": [
            "113"
          ]
        },
        "]), variational inequalities have become one of the most popular and universal optimization formulations.\nVariational inequalities are used in various areas of applied mathematics.\nHere we can highlight both classical examples from game theory, economics, operator theory, convex analysis [",
        {
          "type": "Cite",
          "target": "bib-bib6",
          "content": [
            "6"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib19",
          "content": [
            "19"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib106",
          "content": [
            "106"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib110",
          "content": [
            "110"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib113",
          "content": [
            "113"
          ]
        },
        "], as well as newer and even more recent applications in optimization and machine learning: non-smooth optimization [",
        {
          "type": "Cite",
          "target": "bib-bib93",
          "content": [
            "93"
          ]
        },
        "], unsupervised learning [",
        {
          "type": "Cite",
          "target": "bib-bib9",
          "content": [
            "9"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib36",
          "content": [
            "36"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib22",
          "content": [
            "22"
          ]
        },
        "], robust/adversarial optimization [",
        {
          "type": "Cite",
          "target": "bib-bib11",
          "content": [
            "11"
          ]
        },
        "], GANs [",
        {
          "type": "Cite",
          "target": "bib-bib47",
          "content": [
            "47"
          ]
        },
        "] and reinforcement learning [",
        {
          "type": "Cite",
          "target": "bib-bib100",
          "content": [
            "100"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib57",
          "content": [
            "57"
          ]
        },
        "].\nModern times present new challenges to the community.\nThe increase in scale of problems and the need to speed up solution processes have sparked a huge interest in ",
        {
          "type": "Emphasis",
          "content": [
            "stochastic"
          ]
        },
        " formulations of applied tasks, including variational inequalities.\nThis paper surveys stochastic methods for solving variational inequalities."
      ]
    },
    {
      "type": "Heading",
      "id": "S1",
      "depth": 1,
      "content": [
        "Structure of the paper"
      ]
    },
    {
      "type": "Paragraph",
      "id": "S1.SS0.SSS0.Px1.p1",
      "content": [
        "In Section ",
        {
          "type": "Cite",
          "target": "S2",
          "content": [
            "2"
          ]
        },
        ", we give a formal statement of the variational inequality problem, basic examples, and main assumptions.\nSection ",
        {
          "type": "Cite",
          "target": "S3",
          "content": [
            "3"
          ]
        },
        " deals with deterministic methods, from which stochastic methods have evolved.\nSection ",
        {
          "type": "Cite",
          "target": "S4",
          "content": [
            "4"
          ]
        },
        " covers a variety of stochastic methods.\nSection ",
        {
          "type": "Cite",
          "target": "S5",
          "content": [
            "5"
          ]
        },
        " is devoted to the recent advances in (not necessarily stochastic) variational inequalities and saddle point problems."
      ]
    },
    {
      "type": "Heading",
      "id": "S2",
      "depth": 1,
      "content": [
        "2 Problem: Setting and assumptions"
      ]
    },
    {
      "type": "Claim",
      "id": "Thmnotationx1",
      "label": "Notation.",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Notation"
          ]
        },
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "id": "Thmnotationx1.p1",
          "content": [
            "We use ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m1\" alttext=\"\\langle x,y\\rangle≔\\sum_{i=1}^{d}x_{i}y_{i}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>d</mml:mi></mml:msubsup><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\langle x,y\\rangle≔\\sum_{i=1}^{d}x_{i}y_{i}"
              }
            },
            " to denote the standard inner product of vectors ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m2\" alttext=\"x,y\\in\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math>",
              "meta": {
                "altText": "x,y\\in\\mathbb{R}^{d}"
              }
            },
            ", where ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m3\" alttext=\"x_{i}\" display=\"inline\"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>",
              "meta": {
                "altText": "x_{i}"
              }
            },
            " is the ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m4\" alttext=\"i\" display=\"inline\"><mml:mi>i</mml:mi></mml:math>",
              "meta": {
                "altText": "i"
              }
            },
            "-th component of ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m5\" alttext=\"x\" display=\"inline\"><mml:mi>x</mml:mi></mml:math>",
              "meta": {
                "altText": "x"
              }
            },
            " in the standard basis of ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m6\" alttext=\"\\mathbb{R}^{d}\" display=\"inline\"><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:math>",
              "meta": {
                "altText": "\\mathbb{R}^{d}"
              }
            },
            ".\nIt induces the ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m7\" alttext=\"\\ell_{2}\" display=\"inline\"><mml:msub><mml:mi mathvariant=\"normal\">ℓ</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math>",
              "meta": {
                "altText": "\\ell_{2}"
              }
            },
            "-norm in ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m8\" alttext=\"\\mathbb{R}^{d}\" display=\"inline\"><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:math>",
              "meta": {
                "altText": "\\mathbb{R}^{d}"
              }
            },
            " by ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m9\" alttext=\"\\lVert x\\rVert_{2}≔\\sqrt{\\langle x,x\\rangle}\" display=\"inline\"><mml:mrow><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mi>x</mml:mi><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo>⁢</mml:mo><mml:msqrt><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\lVert x\\rVert_{2}≔\\sqrt{\\langle x,x\\rangle}"
              }
            },
            ".\nWe denote the ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m10\" alttext=\"\\ell_{p}\" display=\"inline\"><mml:msub><mml:mi mathvariant=\"normal\">ℓ</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:math>",
              "meta": {
                "altText": "\\ell_{p}"
              }
            },
            "-norm by ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m11\" alttext=\"\\lVert x\\rVert_{p}≔\\bigl(\\sum_{i=1}^{d}\\lvert x_{i}\\rvert^{p}\\bigr)^{1/{p}}\" display=\"inline\"><mml:mrow><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mi>x</mml:mi><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mi>p</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo>⁢</mml:mo><mml:msup><mml:mrow><mml:mo maxsize=\"120%\" minsize=\"120%\">(</mml:mo><mml:mrow><mml:msubsup><mml:mo lspace=\"0em\" rspace=\"0em\">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>d</mml:mi></mml:msubsup><mml:msup><mml:mrow><mml:mo stretchy=\"false\">|</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy=\"false\">|</mml:mo></mml:mrow><mml:mi>p</mml:mi></mml:msup></mml:mrow><mml:mo maxsize=\"120%\" minsize=\"120%\">)</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\lVert x\\rVert_{p}≔\\bigl(\\sum_{i=1}^{d}\\lvert x_{i}\\rvert^{p}\\bigr)^{1/{p}}"
              }
            },
            " for ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m12\" alttext=\"p\\in[1,\\infty)\" display=\"inline\"><mml:mrow><mml:mi>p</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant=\"normal\">∞</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "p\\in[1,\\infty)"
              }
            },
            ", and ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m13\" alttext=\"\\lVert x\\rVert_{\\infty}≔\\max_{1\\leq i\\leq d}\\lvert x_{i}\\rvert\" display=\"inline\"><mml:mrow><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mi>x</mml:mi><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mi mathvariant=\"normal\">∞</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:msub><mml:mi>max</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>≤</mml:mo><mml:mi>i</mml:mi><mml:mo>≤</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">|</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy=\"false\">|</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\lVert x\\rVert_{\\infty}≔\\max_{1\\leq i\\leq d}\\lvert x_{i}\\rvert"
              }
            },
            " for ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m14\" alttext=\"p=\\infty\" display=\"inline\"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant=\"normal\">∞</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "p=\\infty"
              }
            },
            ".\nThe dual norm ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m15\" alttext=\"\\lVert\\cdot\\rVert_{*}\" display=\"inline\"><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mo lspace=\"0em\" rspace=\"0em\">⋅</mml:mo><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mo>*</mml:mo></mml:msub></mml:math>",
              "meta": {
                "altText": "\\lVert\\cdot\\rVert_{*}"
              }
            },
            " corresponding to the norm ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m16\" alttext=\"\\lVert\\cdot\\rVert\" display=\"inline\"><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mo lspace=\"0em\" rspace=\"0em\">⋅</mml:mo><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\lVert\\cdot\\rVert"
              }
            },
            " is defined by ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m17\" alttext=\"\\lVert y\\rVert_{*}≔\\max\\{\\langle x,y\\rangle\\mid\\lVert x\\rVert\\leq 1\\}\" display=\"inline\"><mml:mrow><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mi>y</mml:mi><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mo>*</mml:mo></mml:msub><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>max</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">{</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow><mml:mo rspace=\"0.1389em\">∣</mml:mo><mml:mrow><mml:mo fence=\"true\" lspace=\"0.1389em\" rspace=\"0em\">∥</mml:mo><mml:mi>x</mml:mi><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow></mml:mrow><mml:mo lspace=\"0.1389em\">≤</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy=\"false\">}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\lVert y\\rVert_{*}≔\\max\\{\\langle x,y\\rangle\\mid\\lVert x\\rVert\\leq 1\\}"
              }
            },
            ".\nThe symbol ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m18\" alttext=\"\\mathbb{E}[\\cdot]\" display=\"inline\"><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mo lspace=\"0em\" rspace=\"0em\">⋅</mml:mo><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\mathbb{E}[\\cdot]"
              }
            },
            " stands for the total mathematical expectation.\nFinally, we need to introduce the symbols ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m19\" alttext=\"\\mathcal{O}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi></mml:math>",
              "meta": {
                "altText": "\\mathcal{O}"
              }
            },
            " and ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m20\" alttext=\"\\Omega\" display=\"inline\"><mml:mi mathvariant=\"normal\">Ω</mml:mi></mml:math>",
              "meta": {
                "altText": "\\Omega"
              }
            },
            " to enclose numerical constants that do not depend on any parameters of the problem, and the symbols ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m21\" alttext=\"\\tilde{\\mathcal{O}}\" display=\"inline\"><mml:mover accent=\"true\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>~</mml:mo></mml:mover></mml:math>",
              "meta": {
                "altText": "\\tilde{\\mathcal{O}}"
              }
            },
            " and ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thmnotationx1.p1.m22\" alttext=\"\\tilde{\\Omega}\" display=\"inline\"><mml:mover accent=\"true\"><mml:mi mathvariant=\"normal\">Ω</mml:mi><mml:mo>~</mml:mo></mml:mover></mml:math>",
              "meta": {
                "altText": "\\tilde{\\Omega}"
              }
            },
            " to enclose numerical constants and logarithmic factors."
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "We study variational inequalities (VI) of the form"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S2.E1",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.E1.m1\" alttext=\"\\textrm{find}\\ z^{*}\\in\\mathcal{Z}\\ \\textrm{such that}\\ \\langle F(z^{*}),z-z^{*}\\rangle\\geq 0\\quad\\forall z\\in\\mathcal{Z},\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:mtext>find</mml:mtext><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo>∈</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:mtext>such that</mml:mtext><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow></mml:mrow><mml:mo>≥</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mspace width=\"1.167em\"/><mml:mrow><mml:mrow><mml:mo rspace=\"0.167em\">∀</mml:mo><mml:mi>z</mml:mi></mml:mrow><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\textrm{find}\\ z^{*}\\in\\mathcal{Z}\\ \\textrm{such that}\\ \\langle F(z^{*}),z-z^{*}\\rangle\\geq 0\\quad\\forall z\\in\\mathcal{Z},"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p1.m1\" alttext=\"F\\colon\\mathcal{Z}\\to\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:mi>F</mml:mi><mml:mo lspace=\"0.278em\" rspace=\"0.278em\">:</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo stretchy=\"false\">→</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F\\colon\\mathcal{Z}\\to\\mathbb{R}^{d}"
          }
        },
        " is an operator and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p1.m2\" alttext=\"\\mathcal{Z}\\subseteq\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>⊆</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}\\subseteq\\mathbb{R}^{d}"
          }
        },
        " is a convex set."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S2.p2",
      "content": [
        "To emphasize the extensiveness of formulation (",
        {
          "type": "Cite",
          "target": "S2-E1",
          "content": [
            "1"
          ]
        },
        "), we give a few examples of variational inequalities arising in applied sciences."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112example1",
      "label": "Example 1(Minimization).",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Example 1"
          ]
        },
        {
          "type": "Strong",
          "content": []
        },
        "(Minimization)",
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "content": [
            "Consider the minimization problem"
          ]
        },
        {
          "type": "MathBlock",
          "id": "S2.E2",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.E2.m1\" alttext=\"\\min_{z\\in\\mathcal{Z}}f(z).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:munder><mml:mi>min</mml:mi><mml:mrow><mml:mi>z</mml:mi><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:munder><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mi>f</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\min_{z\\in\\mathcal{Z}}f(z)."
          }
        },
        {
          "type": "Paragraph",
          "content": [
            "Let ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example1.p1.m1\" alttext=\"F(z)≔\\nabla f(z)\" display=\"inline\"><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mo rspace=\"0.167em\">∇</mml:mo><mml:mi>f</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "F(z)≔\\nabla f(z)"
              }
            },
            ".\nThen, if ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example1.p1.m2\" alttext=\"f\" display=\"inline\"><mml:mi>f</mml:mi></mml:math>",
              "meta": {
                "altText": "f"
              }
            },
            " is convex, one can prove that ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example1.p1.m3\" alttext=\"z^{*}\\in\\mathcal{Z}\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "z^{*}\\in\\mathcal{Z}"
              }
            },
            " is a solution of (",
            {
              "type": "Cite",
              "target": "S2-E1",
              "content": [
                "1"
              ]
            },
            ") if and only if ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example1.p1.m4\" alttext=\"z^{*}\\in\\mathcal{Z}\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "z^{*}\\in\\mathcal{Z}"
              }
            },
            " is a solution of problem (",
            {
              "type": "Cite",
              "target": "S2-E2",
              "content": [
                "2"
              ]
            },
            ")."
          ]
        }
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112example2",
      "label": "Example 2(Saddle point problem).",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Example 2"
          ]
        },
        {
          "type": "Strong",
          "content": []
        },
        "(Saddle point problem)",
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "content": [
            "Consider the saddle point problem (SPP)"
          ]
        },
        {
          "type": "MathBlock",
          "id": "S2.E3",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.E3.m1\" alttext=\"\\min_{x\\in\\mathcal{X}}\\max_{y\\in\\mathcal{Y}}g(x,y).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:munder><mml:mi>min</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒳</mml:mi></mml:mrow></mml:munder><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mrow><mml:munder><mml:mi>max</mml:mi><mml:mrow><mml:mi>y</mml:mi><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒴</mml:mi></mml:mrow></mml:munder><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mi>g</mml:mi></mml:mrow></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\min_{x\\in\\mathcal{X}}\\max_{y\\in\\mathcal{Y}}g(x,y)."
          }
        },
        {
          "type": "Paragraph",
          "content": [
            "Suppose that ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example2.p1.m1\" alttext=\"F(z)≔F(x,y)=[\\nabla_{x}g(x,y),-\\nabla_{y}g(x,y)]\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mo rspace=\"0.167em\">∇</mml:mo><mml:mi>x</mml:mi></mml:msub><mml:mi>g</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo rspace=\"0.167em\">−</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mo rspace=\"0.167em\">∇</mml:mo><mml:mi>y</mml:mi></mml:msub><mml:mi>g</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "F(z)≔F(x,y)=[\\nabla_{x}g(x,y),-\\nabla_{y}g(x,y)]"
              }
            },
            " and ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example2.p1.m2\" alttext=\"\\mathcal{Z}=\\mathcal{X}\\times\\mathcal{Y}\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒳</mml:mi><mml:mo lspace=\"0.222em\" rspace=\"0.222em\">×</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒴</mml:mi></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\mathcal{Z}=\\mathcal{X}\\times\\mathcal{Y}"
              }
            },
            " with ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example2.p1.m3\" alttext=\"\\mathcal{X}\\subseteq\\mathbb{R}^{d_{x}}\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒳</mml:mi><mml:mo>⊆</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:msup></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\mathcal{X}\\subseteq\\mathbb{R}^{d_{x}}"
              }
            },
            ", ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example2.p1.m4\" alttext=\"\\mathcal{Y}\\subseteq\\mathbb{R}^{d_{y}}\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒴</mml:mi><mml:mo>⊆</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:msup></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\mathcal{Y}\\subseteq\\mathbb{R}^{d_{y}}"
              }
            },
            ".\nThen, if ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example2.p1.m5\" alttext=\"g\" display=\"inline\"><mml:mi>g</mml:mi></mml:math>",
              "meta": {
                "altText": "g"
              }
            },
            " is convex-concave, one can prove that ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example2.p1.m6\" alttext=\"z^{*}\\in\\mathcal{Z}\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "z^{*}\\in\\mathcal{Z}"
              }
            },
            " is a solution of problem (",
            {
              "type": "Cite",
              "target": "S2-E1",
              "content": [
                "1"
              ]
            },
            ") if and only if ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example2.p1.m7\" alttext=\"z^{*}\\in\\mathcal{Z}\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "z^{*}\\in\\mathcal{Z}"
              }
            },
            " is a solution of problem (",
            {
              "type": "Cite",
              "target": "S2-E3",
              "content": [
                "3"
              ]
            },
            ").\n"
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "id": "S2.p3",
      "content": [
        "The study of saddle point problems is often associated with variational inequalities."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112example3",
      "label": "Example 3(Fixed point problem).",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Example 3"
          ]
        },
        {
          "type": "Strong",
          "content": []
        },
        "(Fixed point problem)",
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "content": [
            "Consider the fixed point problem"
          ]
        },
        {
          "type": "MathBlock",
          "id": "S2.E4",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.E4.m1\" alttext=\"\\textrm{find}\\ z^{*}\\in\\mathbb{R}^{d}\\ \\textrm{such that}\\ T(z^{*})=z^{*},\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mtext>find</mml:mtext><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo>∈</mml:mo><mml:mrow><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup><mml:mo>⁢</mml:mo><mml:mtext>such that</mml:mtext><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:mi>T</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\textrm{find}\\ z^{*}\\in\\mathbb{R}^{d}\\ \\textrm{such that}\\ T(z^{*})=z^{*},"
          }
        },
        {
          "type": "Paragraph",
          "content": [
            "where ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example3.p1.m1\" alttext=\"T\\colon\\mathbb{R}^{d}\\to\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:mi>T</mml:mi><mml:mo lspace=\"0.278em\" rspace=\"0.278em\">:</mml:mo><mml:mrow><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup><mml:mo stretchy=\"false\">→</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "T\\colon\\mathbb{R}^{d}\\to\\mathbb{R}^{d}"
              }
            },
            " is an operator.\nIf we set ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example3.p1.m2\" alttext=\"F(z)=z-T(z)\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>−</mml:mo><mml:mrow><mml:mi>T</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "F(z)=z-T(z)"
              }
            },
            ", then one can prove that ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example3.p1.m3\" alttext=\"z^{*}\\in\\mathcal{Z}=\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math>",
              "meta": {
                "altText": "z^{*}\\in\\mathcal{Z}=\\mathbb{R}^{d}"
              }
            },
            " is a solution of problem (",
            {
              "type": "Cite",
              "target": "S2-E1",
              "content": [
                "1"
              ]
            },
            ") if and only if ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example3.p1.m4\" alttext=\"F(z^{*})=0\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
              "meta": {
                "altText": "F(z^{*})=0"
              }
            },
            ", i.e., ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112example3.p1.m5\" alttext=\"z^{*}\\in\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math>",
              "meta": {
                "altText": "z^{*}\\in\\mathbb{R}^{d}"
              }
            },
            " is a solution of problem (",
            {
              "type": "Cite",
              "target": "S2-E4",
              "content": [
                "4"
              ]
            },
            ")."
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "id": "S2.p4",
      "content": [
        "For the operator ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p4.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " from (",
        {
          "type": "Cite",
          "target": "S2-E1",
          "content": [
            "1"
          ]
        },
        ") we assume the following."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112assumption1",
      "label": "Assumption 1(Lipschitzness).",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Assumption 1"
          ]
        },
        {
          "type": "Strong",
          "content": []
        },
        "(Lipschitzness)",
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "id": "Thm112assumption1.p1",
          "content": [
            "The operator ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption1.p1.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
              "meta": {
                "altText": "F"
              }
            },
            " is ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption1.p1.m2\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
              "meta": {
                "altText": "L"
              }
            },
            "-Lipschitz continuous, i.e., for all ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption1.p1.m3\" alttext=\"u,v\\in\\mathcal{Z}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "u,v\\in\\mathcal{Z}"
              }
            },
            ", we have ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption1.p1.m4\" alttext=\"\\lVert F(u)-F(v)\\rVert_{*}\\leq L\\lVert u-\\nobreak v\\rVert\" display=\"inline\"><mml:mrow><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mo>*</mml:mo></mml:msub><mml:mo>≤</mml:mo><mml:mrow><mml:mi>L</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>−</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\lVert F(u)-F(v)\\rVert_{*}\\leq L\\lVert u-\\nobreak v\\rVert"
              }
            },
            "."
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "id": "S2.p5",
      "content": [
        "In the context of problems (",
        {
          "type": "Cite",
          "target": "S2-E2",
          "content": [
            "2"
          ]
        },
        ") and (",
        {
          "type": "Cite",
          "target": "S2-E3",
          "content": [
            "3"
          ]
        },
        "), ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p5.m1\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
          "meta": {
            "altText": "L"
          }
        },
        "-Lipschitzness of the operator means that the functions ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p5.m2\" alttext=\"f(z)\" display=\"inline\"><mml:mrow><mml:mi>f</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "f(z)"
          }
        },
        " and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p5.m3\" alttext=\"g(x,y)\" display=\"inline\"><mml:mrow><mml:mi>g</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "g(x,y)"
          }
        },
        " are ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p5.m4\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
          "meta": {
            "altText": "L"
          }
        },
        "-smooth."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112assumption2",
      "label": "Assumption 2(Strong monotonicity).",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Assumption 2"
          ]
        },
        {
          "type": "Strong",
          "content": []
        },
        "(Strong monotonicity)",
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "id": "Thm112assumption2.p1",
          "content": [
            "The operator ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption2.p1.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
              "meta": {
                "altText": "F"
              }
            },
            " is ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption2.p1.m2\" alttext=\"\\mu\" display=\"inline\"><mml:mi>μ</mml:mi></mml:math>",
              "meta": {
                "altText": "\\mu"
              }
            },
            "-strongly monotone, i.e., for all ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption2.p1.m3\" alttext=\"u,v\\in\\mathcal{Z}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "u,v\\in\\mathcal{Z}"
              }
            },
            ", we have ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption2.p1.m4\" alttext=\"\\langle F(u)-F(v),u-v\\rangle\\geq\\mu\\lVert u-v\\rVert^{2}_{2}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>−</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow><mml:mo>≥</mml:mo><mml:mrow><mml:mi>μ</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>−</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\langle F(u)-F(v),u-v\\rangle\\geq\\mu\\lVert u-v\\rVert^{2}_{2}"
              }
            },
            ".\nIf ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption2.p1.m5\" alttext=\"\\mu=0\" display=\"inline\"><mml:mrow><mml:mi>μ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\mu=0"
              }
            },
            ", then the operator ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption2.p1.m6\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
              "meta": {
                "altText": "F"
              }
            },
            " is monotone."
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "id": "S2.p6",
      "content": [
        "In the context of problems (",
        {
          "type": "Cite",
          "target": "S2-E2",
          "content": [
            "2"
          ]
        },
        ") and (",
        {
          "type": "Cite",
          "target": "S2-E3",
          "content": [
            "3"
          ]
        },
        "), strong monotonicity of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p6.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " means strong convexity of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p6.m2\" alttext=\"f(z)\" display=\"inline\"><mml:mrow><mml:mi>f</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "f(z)"
          }
        },
        " and strong convexity-strong concavity of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p6.m3\" alttext=\"g(x,y)\" display=\"inline\"><mml:mrow><mml:mi>g</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "g(x,y)"
          }
        },
        ".\nIn this paper we first focus on the strongly monotone and monotone cases.\nBut there are also various assumptions relaxing monotonicity and strong monotonicity (e.g., see [",
        {
          "type": "Cite",
          "target": "bib-bib55",
          "content": [
            "55"
          ]
        },
        "] and references therein)."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S2.p7",
      "content": [
        "We note that Assumptions ",
        {
          "type": "Cite",
          "target": "Thm112assumption1",
          "content": [
            "1"
          ]
        },
        " and ",
        {
          "type": "Cite",
          "target": "Thm112assumption2",
          "content": [
            "2"
          ]
        },
        " are sufficient for the existence of a solution to problem (",
        {
          "type": "Cite",
          "target": "S2-E1",
          "content": [
            "1"
          ]
        },
        ") (see, e.g., [",
        {
          "type": "Cite",
          "target": "bib-bib37",
          "content": [
            "37"
          ]
        },
        "])."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "Since we work on the set ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p8.m1\" alttext=\"\\mathcal{Z}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}"
          }
        },
        ", it is useful to introduce the Euclidean projection onto ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p8.m2\" alttext=\"\\mathcal{Z}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}"
          }
        },
        ","
      ]
    },
    {
      "type": "MathBlock",
      "id": "S2.Ex1",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.Ex1.m1\" alttext=\"P_{\\mathcal{Z}}(z)=\\arg\\min_{v\\in\\mathcal{Z}}\\lVert z-v\\rVert_{2}.\" class=\"ltx_math_unparsed\" display=\"block\"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>arg</mml:mi><mml:munder><mml:mi>min</mml:mi><mml:mrow><mml:mi>v</mml:mi><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0em\">∥</mml:mo><mml:mi>z</mml:mi><mml:mo>−</mml:mo><mml:mi>v</mml:mi><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "P_{\\mathcal{Z}}(z)=\\arg\\min_{v\\in\\mathcal{Z}}\\lVert z-v\\rVert_{2}."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "To characterize the convergence of the methods for monotone variational inequalities we introduce the gap function,"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S2.E5",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.E5.m1\" alttext=\"\\operatorname{Gap}_{\\mathrm{VI}}(z)≔\\sup_{u\\in\\mathcal{\\mathcal{Z}}}[\\langle F(u),z-u\\rangle].\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>Gap</mml:mi><mml:mi>VI</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:munder><mml:mo movablelimits=\"false\" rspace=\"0em\">sup</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>−</mml:mo><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\operatorname{Gap}_{\\mathrm{VI}}(z)≔\\sup_{u\\in\\mathcal{\\mathcal{Z}}}[\\langle F(u),z-u\\rangle]."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "Such a gap function, regarded as a convergence criterion, is more suitable for the following variational inequality problem:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S2.Ex2",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.Ex2.m1\" alttext=\"\\textrm{find}\\ z^{*}\\in\\mathcal{Z}\\ \\textrm{such that}\\ \\langle F(z),z^{*}-z\\rangle\\leq 0\\quad\\textrm{for}\\ z\\in\\mathcal{Z}.\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:mtext>find</mml:mtext><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo>∈</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:mtext>such that</mml:mtext><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>−</mml:mo><mml:mi>z</mml:mi></mml:mrow><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow></mml:mrow><mml:mo>≤</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mspace width=\"1em\"/><mml:mrow><mml:mrow><mml:mtext>for</mml:mtext><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:mi>z</mml:mi></mml:mrow><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\textrm{find}\\ z^{*}\\in\\mathcal{Z}\\ \\textrm{such that}\\ \\langle F(z),z^{*}-z\\rangle\\leq 0\\quad\\textrm{for}\\ z\\in\\mathcal{Z}."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "Such a solution is also called weak or Minty (whereas the solution of (",
        {
          "type": "Cite",
          "target": "S2-E1",
          "content": [
            "1"
          ]
        },
        ") is called strong or Stampacchia).\nHowever, in view of Assumption ",
        {
          "type": "Cite",
          "target": "Thm112assumption1",
          "content": [
            "1"
          ]
        },
        ", ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p9.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " is single-valued and continuous on ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p9.m2\" alttext=\"\\mathcal{Z}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}"
          }
        },
        ", meaning that actually the two indicated formulations of the variational inequality problem are equivalent [",
        {
          "type": "Cite",
          "target": "bib-bib37",
          "content": [
            "37"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "For the minimization problem (",
        {
          "type": "Cite",
          "target": "S2-E2",
          "content": [
            "2"
          ]
        },
        "), the functional distance to the solution, i.e., the difference ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p10.m1\" alttext=\"f(z)-f(z^{*})\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "f(z)-f(z^{*})"
          }
        },
        ", can be used instead of (",
        {
          "type": "Cite",
          "target": "S2-E5",
          "content": [
            "5"
          ]
        },
        ").\nFor saddle point problems (",
        {
          "type": "Cite",
          "target": "S2-E3",
          "content": [
            "3"
          ]
        },
        "), a slightly different gap function is used, namely,"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S2.E6",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.E6.m1\" alttext=\"\\operatorname{Gap}_{\\mathrm{SPP}}(z)≔\\operatorname{gap}(x,y)=\\max_{y^{\\prime}\\in\\mathcal{Y}}f(x,y^{\\prime})-\\min_{x^{\\prime}\\in\\mathcal{X}}f(x^{\\prime},y).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>Gap</mml:mi><mml:mi>SPP</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>gap</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:munder><mml:mi>max</mml:mi><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒴</mml:mi></mml:mrow></mml:munder><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mi>f</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mrow><mml:munder><mml:mi>min</mml:mi><mml:mrow><mml:msup><mml:mi>x</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒳</mml:mi></mml:mrow></mml:munder><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mi>f</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\operatorname{Gap}_{\\mathrm{SPP}}(z)≔\\operatorname{gap}(x,y)=\\max_{y^{\\prime}\\in\\mathcal{Y}}f(x,y^{\\prime})-\\min_{x^{\\prime}\\in\\mathcal{X}}f(x^{\\prime},y)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "For both functions (",
        {
          "type": "Cite",
          "target": "S2-E5",
          "content": [
            "5"
          ]
        },
        ") and (",
        {
          "type": "Cite",
          "target": "S2-E6",
          "content": [
            "6"
          ]
        },
        ") it is crucial that the feasible set is bounded (in fact it is not necessary to take the whole set ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p10.m2\" alttext=\"\\mathcal{Z}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}"
          }
        },
        ", which can be unbounded – it suffices to take a bounded convex subset ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p10.m3\" alttext=\"\\mathcal{C}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒞</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mathcal{C}"
          }
        },
        " which contains some solution, see [",
        {
          "type": "Cite",
          "target": "bib-bib95",
          "content": [
            "95"
          ]
        },
        "]).\nTherefore it is necessary to define a distance on the set ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p10.m4\" alttext=\"\\mathcal{Z}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}"
          }
        },
        ".\nSince this survey covers methods not only in the Euclidean setup, let us introduce a more general notion of distance."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112definition1",
      "label": "Definition 1(Bregman divergence).",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Definition 1"
          ]
        },
        {
          "type": "Strong",
          "content": []
        },
        "(Bregman divergence)",
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "content": [
            "Let ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112definition1.p1.m1\" alttext=\"\\nu(z)\" display=\"inline\"><mml:mrow><mml:mi>ν</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\nu(z)"
              }
            },
            " be a function that is ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112definition1.p1.m2\" alttext=\"1\" display=\"inline\"><mml:mn>1</mml:mn></mml:math>",
              "meta": {
                "altText": "1"
              }
            },
            "-strongly convex w.r.t. the norm ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112definition1.p1.m3\" alttext=\"\\lVert\\cdot\\rVert\" display=\"inline\"><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mo lspace=\"0em\" rspace=\"0em\">⋅</mml:mo><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\lVert\\cdot\\rVert"
              }
            },
            " and differentiable on ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112definition1.p1.m4\" alttext=\"\\mathcal{Z}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:math>",
              "meta": {
                "altText": "\\mathcal{Z}"
              }
            },
            ".\nThen for any two points ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112definition1.p1.m5\" alttext=\"z,z^{\\prime}\\in\\mathcal{Z}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "z,z^{\\prime}\\in\\mathcal{Z}"
              }
            },
            " the Bregman divergence (or Bregman distance) ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112definition1.p1.m6\" alttext=\"V(z,z^{\\prime})\" display=\"inline\"><mml:mrow><mml:mi>V</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "V(z,z^{\\prime})"
              }
            },
            " associated with ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112definition1.p1.m7\" alttext=\"\\nu(z)\" display=\"inline\"><mml:mrow><mml:mi>ν</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\nu(z)"
              }
            },
            " is defined as"
          ]
        },
        {
          "type": "MathBlock",
          "id": "S2.Ex3",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.Ex3.m1\" alttext=\"V(z,z^{\\prime})≔\\nu(z^{\\prime})-\\nu(z)-\\langle\\nabla\\nu(z),z^{\\prime}-z\\rangle.\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>V</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo>⁢</mml:mo><mml:mi>ν</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>ν</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mrow><mml:mrow><mml:mo rspace=\"0.167em\">∇</mml:mo><mml:mi>ν</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>−</mml:mo><mml:mi>z</mml:mi></mml:mrow><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
          "meta": {
            "altText": "V(z,z^{\\prime})≔\\nu(z^{\\prime})-\\nu(z)-\\langle\\nabla\\nu(z),z^{\\prime}-z\\rangle."
          }
        }
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "We denote the Bregman diameter of the set ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p11.m1\" alttext=\"\\mathcal{Z}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}"
          }
        },
        " w.r.t. the divergence ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p11.m2\" alttext=\"V(z,z^{\\prime})\" display=\"inline\"><mml:mrow><mml:mi>V</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "V(z,z^{\\prime})"
          }
        },
        " as ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p11.m3\" alttext=\"D_{\\mathcal{Z},V}≔\\max\\{\\sqrt{2V(z,z^{\\prime})}\\mid z,z^{\\prime}\\in\\mathcal{Z}\\}\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi></mml:mrow></mml:msub><mml:mo>⁢</mml:mo><mml:mi mathvariant=\"normal\">≔</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>max</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">{</mml:mo><mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:mn>2</mml:mn><mml:mo>⁢</mml:mo><mml:mi>V</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:msqrt><mml:mo>∣</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mrow><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow><mml:mo stretchy=\"false\">}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "D_{\\mathcal{Z},V}≔\\max\\{\\sqrt{2V(z,z^{\\prime})}\\mid z,z^{\\prime}\\in\\mathcal{Z}\\}"
          }
        },
        ".\nIn the Euclidean case, we simply write ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p11.m4\" alttext=\"D_{\\mathcal{Z}}\" display=\"inline\"><mml:msub><mml:mi>D</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "D_{\\mathcal{Z}}"
          }
        },
        " instead of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p11.m5\" alttext=\"D_{\\mathcal{Z},V}\" display=\"inline\"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi></mml:mrow></mml:msub></mml:math>",
          "meta": {
            "altText": "D_{\\mathcal{Z},V}"
          }
        },
        ".\nUsing the definition of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.p11.m6\" alttext=\"V\" display=\"inline\"><mml:mi>V</mml:mi></mml:math>",
          "meta": {
            "altText": "V"
          }
        },
        ", we introduce the so-called proximal operator as follows:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S2.Ex4",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S2.Ex4.m1\" alttext=\"\\operatorname{prox}_{x}(y)=\\arg\\min_{z\\in\\mathcal{Z}}\\{\\langle y,z\\rangle+V(z,x)\\}.\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>prox</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>arg</mml:mi><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mrow><mml:munder><mml:mi>min</mml:mi><mml:mrow><mml:mi>z</mml:mi><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:munder><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">{</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>V</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\operatorname{prox}_{x}(y)=\\arg\\min_{z\\in\\mathcal{Z}}\\{\\langle y,z\\rangle+V(z,x)\\}."
      }
    },
    {
      "type": "Heading",
      "id": "S3",
      "depth": 1,
      "content": [
        "3 Deterministic foundation: Extragradient and other methods"
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "The first and the simplest method for solving the variational inequality (",
        {
          "type": "Cite",
          "target": "S2-E1",
          "content": [
            "1"
          ]
        },
        ") is the iterative scheme (also known as the Gradient method)"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S3.E7",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.E7.m1\" alttext=\"z^{k+1}=P_{\\mathcal{Z}}(z^{k}-\\gamma F(z^{k})),\" display=\"block\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "z^{k+1}=P_{\\mathcal{Z}}(z^{k}-\\gamma F(z^{k})),"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p1.m1\" alttext=\"\\gamma>0\" display=\"inline\"><mml:mrow><mml:mi>γ</mml:mi><mml:mo>&gt;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\gamma>0"
          }
        },
        " is a step size.\nNote that using the proximal operator associated with the Euclidean Bregman divergence this method can be rewritten in the form"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S3.Ex5",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.Ex5.m1\" alttext=\"z^{k+1}=\\operatorname{prox}_{z^{k}}(\\gamma F(z^{k})).\" display=\"block\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>prox</mml:mi><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "z^{k+1}=\\operatorname{prox}_{z^{k}}(\\gamma F(z^{k}))."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "The basic result asserts the convergence of the method to the unique solution of (",
        {
          "type": "Cite",
          "target": "S2-E1",
          "content": [
            "1"
          ]
        },
        ") for strongly monotones and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p1.m2\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
          "meta": {
            "altText": "L"
          }
        },
        "-Lipschitz operators ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p1.m3\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        "; it was obtained in the papers [",
        {
          "type": "Cite",
          "target": "bib-bib19",
          "content": [
            "19"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib106",
          "content": [
            "106"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib110",
          "content": [
            "110"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112theorem1",
      "claimType": "Theorem",
      "label": "Theorem 1.",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Theorem 1"
          ]
        },
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "content": [
            {
              "type": "Emphasis",
              "content": [
                "If Assumptions ",
                {
                  "type": "Cite",
                  "target": "Thm112assumption1",
                  "content": [
                    "1"
                  ]
                },
                " and ",
                {
                  "type": "Cite",
                  "target": "Thm112assumption2",
                  "content": [
                    "2"
                  ]
                },
                " hold and ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m1\" alttext=\"0<\\gamma<2\\mu/L^{2}\" display=\"inline\"><mml:mrow><mml:mn mathvariant=\"normal\">0</mml:mn><mml:mo mathvariant=\"normal\">&lt;</mml:mo><mml:mi>γ</mml:mi><mml:mo mathvariant=\"normal\">&lt;</mml:mo><mml:mrow><mml:mrow><mml:mn mathvariant=\"normal\">2</mml:mn><mml:mo mathvariant=\"italic\">⁢</mml:mo><mml:mi>μ</mml:mi></mml:mrow><mml:mo mathvariant=\"normal\">/</mml:mo><mml:msup><mml:mi>L</mml:mi><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "0<\\gamma<2\\mu/L^{2}"
                  }
                },
                ", then after ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m2\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
                  "meta": {
                    "altText": "k"
                  }
                },
                " iterations method (",
                {
                  "type": "Cite",
                  "target": "S3-E7",
                  "content": [
                    "7"
                  ]
                },
                ") converges to ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m3\" alttext=\"z^{*}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mo mathvariant=\"normal\">*</mml:mo></mml:msup></mml:math>",
                  "meta": {
                    "altText": "z^{*}"
                  }
                },
                " with a linear rate:"
              ]
            }
          ]
        },
        {
          "type": "MathBlock",
          "id": "S3.Ex6",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.Ex6.m1\" alttext=\"\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\mathcal{O}(R_{0}^{2}q^{k}),\\quad\\textrm{with}\\ q=(1-2\\gamma\\mu+\\gamma^{2}L^{2})\" display=\"block\"><mml:mrow><mml:mrow><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>⁢</mml:mo><mml:msup><mml:mi>q</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo rspace=\"1.167em\">,</mml:mo><mml:mrow><mml:mrow><mml:mtext>𝑤𝑖𝑡ℎ</mml:mtext><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:mi>q</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:mo>⁢</mml:mo><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>μ</mml:mi></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msup><mml:mi>γ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:msup><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\mathcal{O}(R_{0}^{2}q^{k}),\\quad\\textrm{with}\\ q=(1-2\\gamma\\mu+\\gamma^{2}L^{2})"
          }
        },
        {
          "type": "Paragraph",
          "content": [
            {
              "type": "Emphasis",
              "content": [
                "and ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m4\" alttext=\"R_{0}\" display=\"inline\"><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant=\"normal\">0</mml:mn></mml:msub></mml:math>",
                  "meta": {
                    "altText": "R_{0}"
                  }
                },
                " denotes (here and everywhere in the sequel) the norm ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m5\" alttext=\"\\lVert z^{0}-z^{*}\\rVert_{2}\" display=\"inline\"><mml:msub><mml:mrow><mml:mo fence=\"true\" mathvariant=\"normal\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mn mathvariant=\"normal\">0</mml:mn></mml:msup><mml:mo mathvariant=\"normal\">−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo mathvariant=\"normal\">*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" mathvariant=\"normal\">∥</mml:mo></mml:mrow><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msub></mml:math>",
                  "meta": {
                    "altText": "\\lVert z^{0}-z^{*}\\rVert_{2}"
                  }
                },
                ".\nFor ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m6\" alttext=\"\\gamma=\\mu/L^{2}\" display=\"inline\"><mml:mrow><mml:mi>γ</mml:mi><mml:mo mathvariant=\"normal\">=</mml:mo><mml:mrow><mml:mi>μ</mml:mi><mml:mo mathvariant=\"normal\">/</mml:mo><mml:msup><mml:mi>L</mml:mi><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "\\gamma=\\mu/L^{2}"
                  }
                },
                ", we have ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m7\" alttext=\"q=(1-1/\\kappa^{2})\" display=\"inline\"><mml:mrow><mml:mi>q</mml:mi><mml:mo mathvariant=\"normal\">=</mml:mo><mml:mrow><mml:mo mathvariant=\"normal\" stretchy=\"false\">(</mml:mo><mml:mrow><mml:mn mathvariant=\"normal\">1</mml:mn><mml:mo mathvariant=\"normal\">−</mml:mo><mml:mrow><mml:mn mathvariant=\"normal\">1</mml:mn><mml:mo mathvariant=\"normal\">/</mml:mo><mml:msup><mml:mi>κ</mml:mi><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo mathvariant=\"normal\" stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "q=(1-1/\\kappa^{2})"
                  }
                },
                ", ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m8\" alttext=\"\\kappa=L/\\mu\" display=\"inline\"><mml:mrow><mml:mi>κ</mml:mi><mml:mo mathvariant=\"normal\">=</mml:mo><mml:mrow><mml:mi>L</mml:mi><mml:mo mathvariant=\"normal\">/</mml:mo><mml:mi>μ</mml:mi></mml:mrow></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "\\kappa=L/\\mu"
                  }
                },
                ", thus the upper bound on the number of iterations needed to achieve the ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m9\" alttext=\"\\varepsilon\" display=\"inline\"><mml:mi>ε</mml:mi></mml:math>",
                  "meta": {
                    "altText": "\\varepsilon"
                  }
                },
                "-solution (i.e., ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m10\" alttext=\"\\lVert z^{k}-z^{*}\\rVert_{2}^{2}\\leq\\nobreak\\varepsilon\" display=\"inline\"><mml:mrow><mml:msubsup><mml:mrow><mml:mo fence=\"true\" mathvariant=\"normal\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo mathvariant=\"normal\">−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo mathvariant=\"normal\">*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" mathvariant=\"normal\">∥</mml:mo></mml:mrow><mml:mn mathvariant=\"normal\">2</mml:mn><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msubsup><mml:mo mathvariant=\"normal\">≤</mml:mo><mml:mi>ε</mml:mi></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "\\lVert z^{k}-z^{*}\\rVert_{2}^{2}\\leq\\nobreak\\varepsilon"
                  }
                },
                ") is ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem1.p1.m11\" alttext=\"\\mathcal{O}(\\kappa^{2}\\log(R_{0}^{2}/\\varepsilon))\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo mathvariant=\"italic\">⁢</mml:mo><mml:mrow><mml:mo mathvariant=\"normal\" stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>κ</mml:mi><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msup><mml:mo lspace=\"0.167em\" mathvariant=\"italic\">⁢</mml:mo><mml:mrow><mml:mi mathvariant=\"normal\">log</mml:mi><mml:mo mathvariant=\"italic\">⁡</mml:mo><mml:mrow><mml:mo mathvariant=\"normal\" stretchy=\"false\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn mathvariant=\"normal\">0</mml:mn><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msubsup><mml:mo mathvariant=\"normal\">/</mml:mo><mml:mi>ε</mml:mi></mml:mrow><mml:mo mathvariant=\"normal\" stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo mathvariant=\"normal\" stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "\\mathcal{O}(\\kappa^{2}\\log(R_{0}^{2}/\\varepsilon))"
                  }
                },
                "."
              ]
            }
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "id": "S3.p2",
      "content": [
        "Various extensions of this statement (for the case when ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p2.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " is not Lipschitz, but with linear growth bounds, or when the values of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p2.m2\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " are corrupted by noise) can be found in [",
        {
          "type": "Cite",
          "target": "bib-bib10",
          "content": [
            "10"
          ]
        },
        ", Theorem 1]."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S3.p3",
      "content": [
        "When ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p3.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " is a potential operator (see Example ",
        {
          "type": "Cite",
          "target": "Thm112example1",
          "content": [
            "1"
          ]
        },
        ") method (",
        {
          "type": "Cite",
          "target": "S3-E7",
          "content": [
            "7"
          ]
        },
        ") coincides with the gradient projection algorithm.\nIt converges for strongly monotone ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p3.m2\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        ".\nMoreover, the bounds for the admissible step size are less restrictive (",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p3.m3\" alttext=\"0<\\gamma<2/L\" display=\"inline\"><mml:mrow><mml:mn>0</mml:mn><mml:mo>&lt;</mml:mo><mml:mi>γ</mml:mi><mml:mo>&lt;</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:mo>/</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "0<\\gamma<2/L"
          }
        },
        ") and the relevant complexity estimates are better (",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p3.m4\" alttext=\"O(\\kappa\\log(R_{0}^{2}/\\varepsilon))\" display=\"inline\"><mml:mrow><mml:mi>O</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>κ</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:mi>ε</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "O(\\kappa\\log(R_{0}^{2}/\\varepsilon))"
          }
        },
        ") than in Theorem ",
        {
          "type": "Cite",
          "target": "Thm112theorem1",
          "content": [
            "1"
          ]
        },
        "; see [",
        {
          "type": "Cite",
          "target": "bib-bib104",
          "content": [
            "104"
          ]
        },
        ", Theorem 2 in Section 1.4.2]."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S3.p4",
      "content": [
        "However, in the general monotone, but not strongly monotone case (for instance, for the convex-concave SPP, Example ",
        {
          "type": "Cite",
          "target": "Thm112example2",
          "content": [
            "2"
          ]
        },
        ") convergence fails.\nThe original statements on the convergence of Uzawa’s method (a version of (",
        {
          "type": "Cite",
          "target": "S3-E7",
          "content": [
            "7"
          ]
        },
        ")) for saddle point problems [",
        {
          "type": "Cite",
          "target": "bib-bib6",
          "content": [
            "6"
          ]
        },
        "] were wrong; there are numerous well-known examples where method (",
        {
          "type": "Cite",
          "target": "S3-E7",
          "content": [
            "7"
          ]
        },
        ") for ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p4.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " corresponding to a bilinear SPP diverges, see, e.g., [",
        {
          "type": "Cite",
          "target": "bib-bib104",
          "content": [
            "104"
          ]
        },
        ", Figure 39]."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S3.p5",
      "content": [
        "There have been many other attempts to recover the convergence of gradient-like methods, not for VIs, but for saddle point problems.\nOne of them is based on the transition to modified Lagrangians when ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p5.m1\" alttext=\"g(x,y)\" display=\"inline\"><mml:mrow><mml:mi>g</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "g(x,y)"
          }
        },
        " is a Lagrange function, see [",
        {
          "type": "Cite",
          "target": "bib-bib45",
          "content": [
            "45"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib104",
          "content": [
            "104"
          ]
        },
        "].\nHowever, we focus on the general VI case.\nA possible approach is based on the idea of ",
        {
          "type": "Emphasis",
          "content": [
            "regularization"
          ]
        },
        ".\nInstead of the monotone variational inequality (",
        {
          "type": "Cite",
          "target": "S2-E1",
          "content": [
            "1"
          ]
        },
        ") one can deal with a regularized inequality, in which the monotone operator ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p5.m2\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " is replaced by strongly monotone one ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p5.m3\" alttext=\"F+\\varepsilon_{k}T\" display=\"inline\"><mml:mrow><mml:mi>F</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>ε</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F+\\varepsilon_{k}T"
          }
        },
        ", where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p5.m4\" alttext=\"T(z)\" display=\"inline\"><mml:mrow><mml:mi>T</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "T(z)"
          }
        },
        " is a strongly monotone operator and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p5.m5\" alttext=\"\\varepsilon_{k}>0\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>ε</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\varepsilon_{k}>0"
          }
        },
        " is a regularization parameter.\nIf we denote by ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p5.m6\" alttext=\"z^{k}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:math>",
          "meta": {
            "altText": "z^{k}"
          }
        },
        " the solution of the regularized VI, then one can prove that ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p5.m7\" alttext=\"z^{k}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:math>",
          "meta": {
            "altText": "z^{k}"
          }
        },
        " converges to ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p5.m8\" alttext=\"z^{*}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:math>",
          "meta": {
            "altText": "z^{*}"
          }
        },
        " as ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p5.m9\" alttext=\"\\varepsilon_{k}\\rightarrow 0\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>ε</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy=\"false\">→</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\varepsilon_{k}\\rightarrow 0"
          }
        },
        " (see [",
        {
          "type": "Cite",
          "target": "bib-bib10",
          "content": [
            "10"
          ]
        },
        "]).\nHowever, usually the solution ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p5.m10\" alttext=\"z^{k}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:math>",
          "meta": {
            "altText": "z^{k}"
          }
        },
        " is not easily available.\nTo address this problem, an ",
        {
          "type": "Emphasis",
          "content": [
            "iterative regularization"
          ]
        },
        " technique is proposed in [",
        {
          "type": "Cite",
          "target": "bib-bib10",
          "content": [
            "10"
          ]
        },
        "], where one step of the basic method (",
        {
          "type": "Cite",
          "target": "S3-E7",
          "content": [
            "7"
          ]
        },
        ") is applied for the regularized problem.\nStep sizes and regularization parameters can be adjusted to guarantee convergence."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S3.p6",
      "content": [
        "Another technique is based on the Proximal Point Method proposed independently by B. Martinet in [",
        {
          "type": "Cite",
          "target": "bib-bib84",
          "content": [
            "84"
          ]
        },
        "] and by T. Rockafellar in [",
        {
          "type": "Cite",
          "target": "bib-bib107",
          "content": [
            "107"
          ]
        },
        "].\nAt each iteration this methods requires the solution of the VI with the operator ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p6.m1\" alttext=\"F+cI\" display=\"inline\"><mml:mrow><mml:mi>F</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>⁢</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F+cI"
          }
        },
        ", where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p6.m2\" alttext=\"c>0\" display=\"inline\"><mml:mrow><mml:mi>c</mml:mi><mml:mo>&gt;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "c>0"
          }
        },
        " and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p6.m3\" alttext=\"I\" display=\"inline\"><mml:mi>I</mml:mi></mml:math>",
          "meta": {
            "altText": "I"
          }
        },
        " is the identity operator.\nThis is an implicit method (similar to the regularization method), however there exist numerous implementable versions of Proximal Point.\nFor instance, some methods discussed below can be considered from this point of view."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "The breakthrough in methods for solving (non-strongly) monotone variational inequalities was made by Galina Korpelevich [",
        {
          "type": "Cite",
          "target": "bib-bib64",
          "content": [
            "64"
          ]
        },
        "].\nShe exploited the idea of extrapolation for the gradient method.\nHow this works can be explained for the simplest example of a two-dimensional min-max problem with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p7.m1\" alttext=\"g(x,y)=xy\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>⁢</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "g(x,y)=xy"
          }
        },
        " and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p7.m2\" alttext=\"\\mathcal{Z}=\\mathbb{R}^{2}\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}=\\mathbb{R}^{2}"
          }
        },
        ".\nIt has the unique saddle point ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p7.m3\" alttext=\"z=0\" display=\"inline\"><mml:mrow><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "z=0"
          }
        },
        ", and in any point ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p7.m4\" alttext=\"z^{k}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:math>",
          "meta": {
            "altText": "z^{k}"
          }
        },
        " the direction ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p7.m5\" alttext=\"F(z^{k})\" display=\"inline\"><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F(z^{k})"
          }
        },
        " is orthogonal to ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p7.m6\" alttext=\"z^{k}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:math>",
          "meta": {
            "altText": "z^{k}"
          }
        },
        "; thus, the iteration (",
        {
          "type": "Cite",
          "target": "S3-E7",
          "content": [
            "7"
          ]
        },
        ") increases the distance to the saddle point.\nHowever, if we perform the step (",
        {
          "type": "Cite",
          "target": "S3-E7",
          "content": [
            "7"
          ]
        },
        ") and get the extrapolated point ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p7.m7\" alttext=\"z^{k+1/2}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "z^{k+1/2}"
          }
        },
        ", the direction ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p7.m8\" alttext=\"-F(z^{k+1/2})\" display=\"inline\"><mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "-F(z^{k+1/2})"
          }
        },
        " is attracted to the saddle point.\nThus, the Extragradient method for solving (",
        {
          "type": "Cite",
          "target": "S2-E1",
          "content": [
            "1"
          ]
        },
        ") reads"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S3.Ex7X",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.Ex7X.m1\" alttext=\"\\displaystyle z^{k+1/2}=P_{\\mathcal{Z}}(z^{k}-\\gamma F(z^{k})),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1/2}=P_{\\mathcal{Z}}(z^{k}-\\gamma F(z^{k})),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S3.Ex7Xa",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.Ex7Xa.m1\" alttext=\"\\displaystyle z^{k+1}=P_{\\mathcal{Z}}(z^{k}-\\gamma F(z^{k+1/2})).\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1}=P_{\\mathcal{Z}}(z^{k}-\\gamma F(z^{k+1/2}))."
      }
    },
    {
      "type": "Claim",
      "id": "Thm112theorem2",
      "claimType": "Theorem",
      "label": "Theorem 2.",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Theorem 2"
          ]
        },
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "id": "Thm112theorem2.p1",
          "content": [
            {
              "type": "Emphasis",
              "content": [
                "Let ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem2.p1.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
                  "meta": {
                    "altText": "F"
                  }
                },
                " satisfy Assumptions ",
                {
                  "type": "Cite",
                  "target": "Thm112assumption1",
                  "content": [
                    "1"
                  ]
                },
                " and ",
                {
                  "type": "Cite",
                  "target": "Thm112assumption2",
                  "content": [
                    "2"
                  ]
                },
                " (with ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem2.p1.m2\" alttext=\"\\mu=0\" display=\"inline\"><mml:mrow><mml:mi>μ</mml:mi><mml:mo mathvariant=\"normal\">=</mml:mo><mml:mn mathvariant=\"normal\">0</mml:mn></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "\\mu=0"
                  }
                },
                ") and let ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem2.p1.m3\" alttext=\"0<\\gamma<1/L\" display=\"inline\"><mml:mrow><mml:mn mathvariant=\"normal\">0</mml:mn><mml:mo mathvariant=\"normal\">&lt;</mml:mo><mml:mi>γ</mml:mi><mml:mo mathvariant=\"normal\">&lt;</mml:mo><mml:mrow><mml:mn mathvariant=\"normal\">1</mml:mn><mml:mo mathvariant=\"normal\">/</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "0<\\gamma<1/L"
                  }
                },
                ".\nThen the sequence of iterates ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem2.p1.m4\" alttext=\"z^{k}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:math>",
                  "meta": {
                    "altText": "z^{k}"
                  }
                },
                " generated by the Extragradient method converges to ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem2.p1.m5\" alttext=\"z^{\\star}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mo mathvariant=\"normal\">⋆</mml:mo></mml:msup></mml:math>",
                  "meta": {
                    "altText": "z^{\\star}"
                  }
                },
                "."
              ]
            }
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "id": "S3.p8",
      "content": [
        "For the particular cases of the zero-sum matrix game or the general bilinear problem with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p8.m1\" alttext=\"g(x,y)=y^{\\top}\\mathbf{A}x-b^{\\top}x+c^{\\top}y\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo>⁢</mml:mo><mml:mi>𝐀</mml:mi><mml:mo>⁢</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:msup><mml:mi>b</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo>⁢</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msup><mml:mi>c</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo>⁢</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "g(x,y)=y^{\\top}\\mathbf{A}x-b^{\\top}x+c^{\\top}y"
          }
        },
        ", the method converges linearly, provided that the optimal solution is unique (see [",
        {
          "type": "Cite",
          "target": "bib-bib64",
          "content": [
            "64"
          ]
        },
        ", Theorem 3]).\nIn this case, the rate of convergence is equal to ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p8.m2\" alttext=\"\\mathcal{O}(\\kappa\\log(R_{0}^{2}/\\varepsilon))\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>κ</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:mi>ε</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathcal{O}(\\kappa\\log(R_{0}^{2}/\\varepsilon))"
          }
        },
        " with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p8.m3\" alttext=\"\\kappa=\\lambda_{\\mathrm{max}}(\\mathbf{A}\\mathbf{A}^{\\top})/\\lambda_{\\mathrm{min}}(\\mathbf{A}\\mathbf{A}^{\\top})\" display=\"inline\"><mml:mrow><mml:mi>κ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>λ</mml:mi><mml:mi>max</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>𝐀𝐀</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>/</mml:mo><mml:msub><mml:mi>λ</mml:mi><mml:mi>min</mml:mi></mml:msub></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>𝐀𝐀</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\kappa=\\lambda_{\\mathrm{max}}(\\mathbf{A}\\mathbf{A}^{\\top})/\\lambda_{\\mathrm{min}}(\\mathbf{A}\\mathbf{A}^{\\top})"
          }
        },
        ".\nMore general upper bounds for the Extragradient method can be found in [",
        {
          "type": "Cite",
          "target": "bib-bib119",
          "content": [
            "119"
          ]
        },
        "] and in the recent paper [",
        {
          "type": "Cite",
          "target": "bib-bib87",
          "content": [
            "87"
          ]
        },
        "].\nIn particular, for the strongly monotone case the estimate ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p8.m4\" alttext=\"O(\\kappa\\log(R_{0}^{2}/\\varepsilon))\" display=\"inline\"><mml:mrow><mml:mi>O</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>κ</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:mi>ε</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "O(\\kappa\\log(R_{0}^{2}/\\varepsilon))"
          }
        },
        " with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p8.m5\" alttext=\"\\kappa=L/\\mu\" display=\"inline\"><mml:mrow><mml:mi>κ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mi>L</mml:mi><mml:mo>/</mml:mo><mml:mi>μ</mml:mi></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\kappa=L/\\mu"
          }
        },
        " holds true (compare with the much worse bound ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p8.m6\" alttext=\"O(\\kappa^{2}\\log(R_{0}^{2}/\\varepsilon))\" display=\"inline\"><mml:mrow><mml:mi>O</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>κ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:mi>ε</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "O(\\kappa^{2}\\log(R_{0}^{2}/\\varepsilon))"
          }
        },
        " for the Gradient method).\nAn adaptive version of the Extragradient method (no knowledge of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p8.m7\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
          "meta": {
            "altText": "L"
          }
        },
        " is required) is proposed in [",
        {
          "type": "Cite",
          "target": "bib-bib61",
          "content": [
            "61"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "Another version of the Extragradient method for finding saddle points is provided in [",
        {
          "type": "Cite",
          "target": "bib-bib65",
          "content": [
            "65"
          ]
        },
        "].\nConsidering the setup of Example ",
        {
          "type": "Cite",
          "target": "Thm112example2",
          "content": [
            "2"
          ]
        },
        ", we can exploit just one extrapolating step for the variables ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p9.m1\" alttext=\"y\" display=\"inline\"><mml:mi>y</mml:mi></mml:math>",
          "meta": {
            "altText": "y"
          }
        },
        ":"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S3.E8X",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.E8X.m1\" alttext=\"\\displaystyle y^{k+1/2}=P_{\\mathcal{Y}}(y^{k}+\\gamma\\nabla_{y}g(x^{k},y^{k})),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒴</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:msub><mml:mo rspace=\"0.167em\">∇</mml:mo><mml:mi>y</mml:mi></mml:msub><mml:mi>g</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle y^{k+1/2}=P_{\\mathcal{Y}}(y^{k}+\\gamma\\nabla_{y}g(x^{k},y^{k})),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S3.E8Xa",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.E8Xa.m1\" alttext=\"\\displaystyle x^{k+1}=P_{\\mathcal{X}}(x^{k}-\\gamma\\nabla_{x}g(x^{k},y^{k+1/2}),\" class=\"ltx_math_unparsed\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒳</mml:mi></mml:msub><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mi>γ</mml:mi><mml:msub><mml:mo lspace=\"0.167em\" rspace=\"0.167em\">∇</mml:mo><mml:mi>x</mml:mi></mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle x^{k+1}=P_{\\mathcal{X}}(x^{k}-\\gamma\\nabla_{x}g(x^{k},y^{k+1/2}),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S3.E8Xb",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.E8Xb.m1\" alttext=\"\\displaystyle y^{k+1}=y^{k}+q(y^{k+1/2}-y^{k}),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mrow><mml:mi>q</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle y^{k+1}=y^{k}+q(y^{k+1/2}-y^{k}),"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p9.m2\" alttext=\"0<\\gamma<1/(2L)\" display=\"inline\"><mml:mrow><mml:mn>0</mml:mn><mml:mo>&lt;</mml:mo><mml:mi>γ</mml:mi><mml:mo>&lt;</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:mo>⁢</mml:mo><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "0<\\gamma<1/(2L)"
          }
        },
        " and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p9.m3\" alttext=\"0<q<1\" display=\"inline\"><mml:mrow><mml:mn>0</mml:mn><mml:mo>&lt;</mml:mo><mml:mi>q</mml:mi><mml:mo>&lt;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "0<q<1"
          }
        },
        ".\nThis method converges to the solution and if ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p9.m4\" alttext=\"g(x,y)\" display=\"inline\"><mml:mrow><mml:mi>g</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "g(x,y)"
          }
        },
        " is linear in ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p9.m5\" alttext=\"y\" display=\"inline\"><mml:mi>y</mml:mi></mml:math>",
          "meta": {
            "altText": "y"
          }
        },
        ", then the rate of convergence is linear.\nIf we set ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p9.m6\" alttext=\"q=1\" display=\"inline\"><mml:mrow><mml:mi>q</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "q=1"
          }
        },
        " in method (",
        {
          "type": "Cite",
          "target": "S3-E8",
          "content": [
            "8"
          ]
        },
        "), then ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p9.m7\" alttext=\"y^{k+1}=y^{k+1/2}\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:math>",
          "meta": {
            "altText": "y^{k+1}=y^{k+1/2}"
          }
        },
        " and we get the so-called Alternating Gradient Method (alternating descent-ascent).\nIn [",
        {
          "type": "Cite",
          "target": "bib-bib123",
          "content": [
            "123"
          ]
        },
        "], it was proved that this method has ",
        {
          "type": "Emphasis",
          "content": [
            "local"
          ]
        },
        " linear convergence with complexity ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p9.m8\" alttext=\"O(\\kappa\\log(R_{0}^{2}/\\varepsilon))\" display=\"inline\"><mml:mrow><mml:mi>O</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>κ</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:mi>ε</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "O(\\kappa\\log(R_{0}^{2}/\\varepsilon))"
          }
        },
        ", where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p9.m9\" alttext=\"\\kappa=L/\\mu\" display=\"inline\"><mml:mrow><mml:mi>κ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mi>L</mml:mi><mml:mo>/</mml:mo><mml:mi>μ</mml:mi></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\kappa=L/\\mu"
          }
        },
        "."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "L. Popov [",
        {
          "type": "Cite",
          "target": "bib-bib105",
          "content": [
            "105"
          ]
        },
        "] proposed a version of extrapolation scheme (sometimes this type of scheme is referred to as ",
        {
          "type": "Emphasis",
          "content": [
            "optimistic"
          ]
        },
        " or ",
        {
          "type": "Emphasis",
          "content": [
            "single-call"
          ]
        },
        "):"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S3.E9X",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.E9X.m1\" alttext=\"\\displaystyle z^{k+1/2}=P_{\\mathcal{Z}}(z^{k}-\\gamma F(z^{k-1/2})),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>−</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1/2}=P_{\\mathcal{Z}}(z^{k}-\\gamma F(z^{k-1/2})),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S3.E9Xa",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.E9Xa.m1\" alttext=\"\\displaystyle z^{k+1}=P_{\\mathcal{Z}}(z^{k}-\\gamma F(z^{k+1/2})).\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1}=P_{\\mathcal{Z}}(z^{k}-\\gamma F(z^{k+1/2}))."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "It requires the single calculation of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p10.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " at each iteration vs two calculations in the Extragradient method.\nAs shown in [",
        {
          "type": "Cite",
          "target": "bib-bib105",
          "content": [
            "105"
          ]
        },
        "], method (",
        {
          "type": "Cite",
          "target": "S3-E9",
          "content": [
            "9"
          ]
        },
        ") converges for ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p10.m2\" alttext=\"0<\\gamma<1/(3L)\" display=\"inline\"><mml:mrow><mml:mn>0</mml:mn><mml:mo>&lt;</mml:mo><mml:mi>γ</mml:mi><mml:mo>&lt;</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mn>3</mml:mn><mml:mo>⁢</mml:mo><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "0<\\gamma<1/(3L)"
          }
        },
        ".\nRates of convergence for this method were derived recently in [",
        {
          "type": "Cite",
          "target": "bib-bib41",
          "content": [
            "41"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib87",
          "content": [
            "87"
          ]
        },
        "], i.e., ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p10.m3\" alttext=\"O(\\kappa\\log(R_{0}^{2}/\\varepsilon))\" display=\"inline\"><mml:mrow><mml:mi>O</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>κ</mml:mi><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:mi>ε</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "O(\\kappa\\log(R_{0}^{2}/\\varepsilon))"
          }
        },
        " with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p10.m4\" alttext=\"\\kappa=L/\\mu\" display=\"inline\"><mml:mrow><mml:mi>κ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mi>L</mml:mi><mml:mo>/</mml:mo><mml:mi>μ</mml:mi></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\kappa=L/\\mu"
          }
        },
        " for the strongly monotone case and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p10.m5\" alttext=\"\\kappa=\\lambda_{\\mathrm{max}}(\\mathbf{A}\\mathbf{A}^{\\top})/\\lambda_{\\mathrm{min}}(\\mathbf{A}\\mathbf{A}^{\\top})\" display=\"inline\"><mml:mrow><mml:mi>κ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>λ</mml:mi><mml:mi>max</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>𝐀𝐀</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>/</mml:mo><mml:msub><mml:mi>λ</mml:mi><mml:mi>min</mml:mi></mml:msub></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>𝐀𝐀</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\kappa=\\lambda_{\\mathrm{max}}(\\mathbf{A}\\mathbf{A}^{\\top})/\\lambda_{\\mathrm{min}}(\\mathbf{A}\\mathbf{A}^{\\top})"
          }
        },
        " for the bilinear case.\nNote that in the general strongly monotone case this estimate is optimal [",
        {
          "type": "Cite",
          "target": "bib-bib124",
          "content": [
            "124"
          ]
        },
        "], but for the bilinear problem the upper bounds available in the literature for both the Extragradient and optimistic methods are not tight [",
        {
          "type": "Cite",
          "target": "bib-bib56",
          "content": [
            "56"
          ]
        },
        "].\nMeanwhile, optimal estimates ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p10.m6\" alttext=\"O(\\sqrt{\\kappa}\\log(R_{0}^{2}/\\varepsilon))\" display=\"inline\"><mml:mrow><mml:mi>O</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msqrt><mml:mi>κ</mml:mi></mml:msqrt><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:mi>ε</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "O(\\sqrt{\\kappa}\\log(R_{0}^{2}/\\varepsilon))"
          }
        },
        " with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.p10.m7\" alttext=\"\\kappa=\\lambda_{\\mathrm{max}}(\\mathbf{A}\\mathbf{A}^{\\top})/\\lambda_{\\mathrm{min}}(\\mathbf{A}\\mathbf{A}^{\\top})\" display=\"inline\"><mml:mrow><mml:mi>κ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>λ</mml:mi><mml:mi>max</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>𝐀𝐀</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>/</mml:mo><mml:msub><mml:mi>λ</mml:mi><mml:mi>min</mml:mi></mml:msub></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>𝐀𝐀</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\kappa=\\lambda_{\\mathrm{max}}(\\mathbf{A}\\mathbf{A}^{\\top})/\\lambda_{\\mathrm{min}}(\\mathbf{A}\\mathbf{A}^{\\top})"
          }
        },
        " can be achieved using approaches from [",
        {
          "type": "Cite",
          "target": "bib-bib7",
          "content": [
            "7"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib4",
          "content": [
            "4"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "An extension of the above schemes to an arbitrary proximal setup was obtained in the work of A. Nemirovsky [",
        {
          "type": "Cite",
          "target": "bib-bib92",
          "content": [
            "92"
          ]
        },
        "].\nHe proposed the Mirror-Prox method for VIs, exploiting the Bregman divergence:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S3.E10X",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.E10X.m1\" alttext=\"\\displaystyle z^{k+1/2}=\\operatorname{prox}_{z^{k}}(\\gamma F(z^{k})),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>prox</mml:mi><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1/2}=\\operatorname{prox}_{z^{k}}(\\gamma F(z^{k})),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S3.E10Xa",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.E10Xa.m1\" alttext=\"\\displaystyle z^{k+1}=\\operatorname{prox}_{z^{k}}(\\gamma F(z^{k+1/2})).\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>prox</mml:mi><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1}=\\operatorname{prox}_{z^{k}}(\\gamma F(z^{k+1/2}))."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "This yields the following rate-of-convergence result."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112theorem3",
      "claimType": "Theorem",
      "label": "Theorem 3.",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Theorem 3"
          ]
        },
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "content": [
            {
              "type": "Emphasis",
              "content": [
                "Let ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem3.p1.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
                  "meta": {
                    "altText": "F"
                  }
                },
                " satisfy Assumptions ",
                {
                  "type": "Cite",
                  "target": "Thm112assumption1",
                  "content": [
                    "1"
                  ]
                },
                " and ",
                {
                  "type": "Cite",
                  "target": "Thm112assumption2",
                  "content": [
                    "2"
                  ]
                },
                " (with ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem3.p1.m2\" alttext=\"\\mu=0\" display=\"inline\"><mml:mrow><mml:mi>μ</mml:mi><mml:mo mathvariant=\"normal\">=</mml:mo><mml:mn mathvariant=\"normal\">0</mml:mn></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "\\mu=0"
                  }
                },
                "), and let"
              ]
            }
          ]
        },
        {
          "type": "MathBlock",
          "id": "S3.E11",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.E11.m1\" alttext=\"\\hat{z}^{k}=\\smash[t]{\\frac{1}{k}\\sum_{i=1}^{k}z^{i+1/2}},\" display=\"block\"><mml:mrow><mml:mrow><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>k</mml:mi></mml:mfrac><mml:mo>⁢</mml:mo><mml:mrow><mml:munderover><mml:mo movablelimits=\"false\">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:munderover><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\hat{z}^{k}=\\smash[t]{\\frac{1}{k}\\sum_{i=1}^{k}z^{i+1/2}},"
          }
        },
        {
          "type": "Paragraph",
          "content": [
            {
              "type": "Emphasis",
              "content": [
                "where ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem3.p1.m3\" alttext=\"z^{i+1/2}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo mathvariant=\"normal\">+</mml:mo><mml:mrow><mml:mn mathvariant=\"normal\">1</mml:mn><mml:mo mathvariant=\"normal\">/</mml:mo><mml:mn mathvariant=\"normal\">2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:math>",
                  "meta": {
                    "altText": "z^{i+1/2}"
                  }
                },
                " are generated by algorithm (",
                {
                  "type": "Cite",
                  "target": "S3-E10",
                  "content": [
                    "10"
                  ]
                },
                ") with ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem3.p1.m4\" alttext=\"\\gamma=1/(\\sqrt{2}L)\" display=\"inline\"><mml:mrow><mml:mi>γ</mml:mi><mml:mo mathvariant=\"normal\">=</mml:mo><mml:mrow><mml:mn mathvariant=\"normal\">1</mml:mn><mml:mo mathvariant=\"normal\">/</mml:mo><mml:mrow><mml:mo mathvariant=\"normal\" stretchy=\"false\">(</mml:mo><mml:mrow><mml:msqrt><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msqrt><mml:mo mathvariant=\"italic\">⁢</mml:mo><mml:mi>L</mml:mi></mml:mrow><mml:mo mathvariant=\"normal\" stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "\\gamma=1/(\\sqrt{2}L)"
                  }
                },
                ".\nThen, after ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem3.p1.m5\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
                  "meta": {
                    "altText": "k"
                  }
                },
                " iterations,"
              ]
            }
          ]
        },
        {
          "type": "MathBlock",
          "id": "S3.E12",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S3.E12.m1\" alttext=\"\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})=\\mathcal{O}\\biggl(\\frac{LD_{\\mathcal{Z},V}^{2}}{k}\\biggr).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>Gap</mml:mi><mml:mi>VI</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mfrac><mml:mrow><mml:mi>L</mml:mi><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mi>k</mml:mi></mml:mfrac><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})=\\mathcal{O}\\biggl(\\frac{LD_{\\mathcal{Z},V}^{2}}{k}\\biggr)."
          }
        }
      ]
    },
    {
      "type": "Paragraph",
      "id": "S3.p12",
      "content": [
        "Numerous extensions of these original versions of iterative methods for solving variational inequalities were published later.\nOne can highlight Tseng’s Forward-Backward Splitting [",
        {
          "type": "Cite",
          "target": "bib-bib120",
          "content": [
            "120"
          ]
        },
        "], Nesterov’s Dual Extrapolation [",
        {
          "type": "Cite",
          "target": "bib-bib95",
          "content": [
            "95"
          ]
        },
        "], Malitsky and Tam’s Forward-Reflected-Backward [",
        {
          "type": "Cite",
          "target": "bib-bib83",
          "content": [
            "83"
          ]
        },
        "].\nAll methods have convergence guarantees (",
        {
          "type": "Cite",
          "target": "S3-E12",
          "content": [
            "12"
          ]
        },
        ").\nIt turns out that this rate is optimal [",
        {
          "type": "Cite",
          "target": "bib-bib101",
          "content": [
            "101"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Heading",
      "id": "S4",
      "depth": 1,
      "content": [
        "4 Stochastic methods: Different setups and assumptions"
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "In this section, we move from deterministic to stochastic methods, i.e., we consider problem (",
        {
          "type": "Cite",
          "target": "S2-E1",
          "content": [
            "1"
          ]
        },
        ") with an operator of the form"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E13",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E13.m1\" alttext=\"F(z)=\\mathbb{E}_{\\xi\\sim\\mathcal{D}}[F_{\\xi}(z)],\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>𝔼</mml:mi><mml:mrow><mml:mi>ξ</mml:mi><mml:mo>∼</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒟</mml:mi></mml:mrow></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "F(z)=\\mathbb{E}_{\\xi\\sim\\mathcal{D}}[F_{\\xi}(z)],"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.p1.m1\" alttext=\"\\xi\" display=\"inline\"><mml:mi>ξ</mml:mi></mml:math>",
          "meta": {
            "altText": "\\xi"
          }
        },
        " is a random variable, ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.p1.m2\" alttext=\"\\mathcal{D}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒟</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mathcal{D}"
          }
        },
        " is some (typically unknown) probability distribution and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.p1.m3\" alttext=\"F_{\\xi}\\colon\\mathcal{Z}\\to\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo lspace=\"0.278em\" rspace=\"0.278em\">:</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo stretchy=\"false\">→</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F_{\\xi}\\colon\\mathcal{Z}\\to\\mathbb{R}^{d}"
          }
        },
        " is a stochastic operator.\nIn this setup, calculating the value of the full operator ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.p1.m4\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " is computationally expensive or even intractable.\nTherefore, one has to work mainly with stochastic realizations ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.p1.m5\" alttext=\"F_{\\xi}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "F_{\\xi}"
          }
        },
        "."
      ]
    },
    {
      "type": "Heading",
      "id": "S4.SS1",
      "depth": 2,
      "content": [
        "4.1 General case"
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "The stochastic formulation (",
        {
          "type": "Cite",
          "target": "S4-E13",
          "content": [
            "13"
          ]
        },
        ") for problem (",
        {
          "type": "Cite",
          "target": "S2-E1",
          "content": [
            "1"
          ]
        },
        ") was first considered by the authors of [",
        {
          "type": "Cite",
          "target": "bib-bib60",
          "content": [
            "60"
          ]
        },
        "].\nThey proposed a natural stochastic generalization of the Extragradient method (more precisely, of the Mirror-Prox methods):"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E14X",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E14X.m1\" alttext=\"\\displaystyle z^{k+1/2}=\\operatorname{prox}_{z^{k}}(\\gamma F_{\\xi^{k}}(z^{k})),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>prox</mml:mi><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1/2}=\\operatorname{prox}_{z^{k}}(\\gamma F_{\\xi^{k}}(z^{k})),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S4.E14Xa",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E14Xa.m1\" alttext=\"\\displaystyle z^{k+1}=\\operatorname{prox}_{z^{k}}(\\gamma F_{\\xi^{k+1/2}}(z^{k+1/2})).\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>prox</mml:mi><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1}=\\operatorname{prox}_{z^{k}}(\\gamma F_{\\xi^{k+1/2}}(z^{k+1/2}))."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "Here it is important to note that the variables ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p1.m1\" alttext=\"\\xi^{k}\" display=\"inline\"><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:math>",
          "meta": {
            "altText": "\\xi^{k}"
          }
        },
        " and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p1.m2\" alttext=\"\\xi^{k+1/2}\" display=\"inline\"><mml:msup><mml:mi>ξ</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "\\xi^{k+1/2}"
          }
        },
        " are independent and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p1.m3\" alttext=\"F_{\\xi}(z)\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F_{\\xi}(z)"
          }
        },
        " is an unbiased estimator of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p1.m4\" alttext=\"F(z)\" display=\"inline\"><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F(z)"
          }
        },
        ".\nMoreover, ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p1.m5\" alttext=\"F_{\\xi}(z)\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F_{\\xi}(z)"
          }
        },
        " is assumed to satisfy the following condition."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112assumption3",
      "label": "Assumption 3(Bounded variance).",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Assumption 3"
          ]
        },
        {
          "type": "Strong",
          "content": []
        },
        "(Bounded variance)",
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "id": "Thm112assumption3.p1",
          "content": [
            "The unbiased operator ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption3.p1.m1\" alttext=\"F_{\\xi}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub></mml:math>",
              "meta": {
                "altText": "F_{\\xi}"
              }
            },
            " has uniformly bounded variance, i.e., for all ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption3.p1.m2\" alttext=\"\\xi\\sim\\mathcal{D}\" display=\"inline\"><mml:mrow><mml:mi>ξ</mml:mi><mml:mo>∼</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒟</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\xi\\sim\\mathcal{D}"
              }
            },
            " and ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption3.p1.m3\" alttext=\"u\\in\\mathcal{Z}\" display=\"inline\"><mml:mrow><mml:mi>u</mml:mi><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "u\\in\\mathcal{Z}"
              }
            },
            ", we have ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption3.p1.m4\" alttext=\"\\mathbb{E}\\lVert F_{\\xi}(u)-F(u)\\rVert^{2}_{*}\\leq\\sigma^{2}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">≤</mml:mo><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\mathbb{E}\\lVert F_{\\xi}(u)-F(u)\\rVert^{2}_{*}\\leq\\sigma^{2}"
              }
            },
            "."
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "id": "S4.SS1.p2",
      "content": [
        "Under this assumption, the following result was established in [",
        {
          "type": "Cite",
          "target": "bib-bib60",
          "content": [
            "60"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112theorem4",
      "claimType": "Theorem",
      "label": "Theorem 4.",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Theorem 4"
          ]
        },
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "content": [
            {
              "type": "Emphasis",
              "content": [
                "Let ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem4.p1.m1\" alttext=\"F_{\\xi}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub></mml:math>",
                  "meta": {
                    "altText": "F_{\\xi}"
                  }
                },
                " satisfy Assumptions ",
                {
                  "type": "Cite",
                  "target": "Thm112assumption1",
                  "content": [
                    "1"
                  ]
                },
                ", ",
                {
                  "type": "Cite",
                  "target": "Thm112assumption2",
                  "content": [
                    "2"
                  ]
                },
                " (with ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem4.p1.m2\" alttext=\"\\mu=0\" display=\"inline\"><mml:mrow><mml:mi>μ</mml:mi><mml:mo mathvariant=\"normal\">=</mml:mo><mml:mn mathvariant=\"normal\">0</mml:mn></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "\\mu=0"
                  }
                },
                ") and ",
                {
                  "type": "Cite",
                  "target": "Thm112assumption3",
                  "content": [
                    "3"
                  ]
                },
                ", and let ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem4.p1.m3\" alttext=\"\\hat{z}^{k}\" display=\"inline\"><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo mathvariant=\"normal\">^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msup></mml:math>",
                  "meta": {
                    "altText": "\\hat{z}^{k}"
                  }
                },
                " be defined as in (",
                {
                  "type": "Cite",
                  "target": "S3-E11",
                  "content": [
                    "11"
                  ]
                },
                "), where ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem4.p1.m4\" alttext=\"z^{i+1/2}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo mathvariant=\"normal\">+</mml:mo><mml:mrow><mml:mn mathvariant=\"normal\">1</mml:mn><mml:mo mathvariant=\"normal\">/</mml:mo><mml:mn mathvariant=\"normal\">2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:math>",
                  "meta": {
                    "altText": "z^{i+1/2}"
                  }
                },
                " are generated by algorithm (",
                {
                  "type": "Cite",
                  "target": "S4-E14",
                  "content": [
                    "14"
                  ]
                },
                ") with ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem4.p1.m5\" alttext=\"\\gamma=\\min\\bigl\\{\\frac{1}{\\sqrt{3}L},D_{\\mathcal{Z},V}\\sqrt{\\frac{1}{7k\\sigma^{2}}}\\bigr\\}\" display=\"inline\"><mml:mrow><mml:mi>γ</mml:mi><mml:mo mathvariant=\"normal\">=</mml:mo><mml:mrow><mml:mi mathvariant=\"normal\">min</mml:mi><mml:mo mathvariant=\"italic\">⁡</mml:mo><mml:mrow><mml:mo mathvariant=\"normal\" maxsize=\"120%\" minsize=\"120%\">{</mml:mo><mml:mfrac><mml:mn mathvariant=\"normal\">1</mml:mn><mml:mrow><mml:msqrt><mml:mn mathvariant=\"normal\">3</mml:mn></mml:msqrt><mml:mo mathvariant=\"italic\">⁢</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:mfrac><mml:mo mathvariant=\"normal\">,</mml:mo><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo mathvariant=\"normal\">,</mml:mo><mml:mi>V</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant=\"italic\">⁢</mml:mo><mml:msqrt><mml:mfrac><mml:mn mathvariant=\"normal\">1</mml:mn><mml:mrow><mml:mn mathvariant=\"normal\">7</mml:mn><mml:mo mathvariant=\"italic\">⁢</mml:mo><mml:mi>k</mml:mi><mml:mo mathvariant=\"italic\">⁢</mml:mo><mml:msup><mml:mi>σ</mml:mi><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:msqrt></mml:mrow><mml:mo mathvariant=\"normal\" maxsize=\"120%\" minsize=\"120%\">}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "\\gamma=\\min\\bigl\\{\\frac{1}{\\sqrt{3}L},D_{\\mathcal{Z},V}\\sqrt{\\frac{1}{7k\\sigma^{2}}}\\bigr\\}"
                  }
                },
                ".\nThen, after ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem4.p1.m6\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
                  "meta": {
                    "altText": "k"
                  }
                },
                " iterations, one can guarantee that"
              ]
            }
          ]
        },
        {
          "type": "MathBlock",
          "id": "S4.E15",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E15.m1\" alttext=\"\\mathbb{E}[\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})]=\\mathcal{O}\\biggl(\\frac{LD^{2}_{\\mathcal{Z},V}}{k}+D_{\\mathcal{Z},V}\\sqrt{\\frac{\\sigma^{2}}{k}}\\biggr).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mrow><mml:msub><mml:mi>Gap</mml:mi><mml:mi>VI</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>L</mml:mi><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mi>k</mml:mi></mml:mfrac><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi></mml:mrow></mml:msub><mml:mo>⁢</mml:mo><mml:msqrt><mml:mfrac><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mi>k</mml:mi></mml:mfrac></mml:msqrt></mml:mrow></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathbb{E}[\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})]=\\mathcal{O}\\biggl(\\frac{LD^{2}_{\\mathcal{Z},V}}{k}+D_{\\mathcal{Z},V}\\sqrt{\\frac{\\sigma^{2}}{k}}\\biggr)."
          }
        }
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "In [",
        {
          "type": "Cite",
          "target": "bib-bib17",
          "content": [
            "17"
          ]
        },
        "], the authors carried out an analysis of algorithm (",
        {
          "type": "Cite",
          "target": "S4-E14",
          "content": [
            "14"
          ]
        },
        ") for strongly monotone VIs in the Euclidean case.\nIn particular, under Assumptions ",
        {
          "type": "Cite",
          "target": "Thm112assumption1",
          "content": [
            "1"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "Thm112assumption2",
          "content": [
            "2"
          ]
        },
        " and ",
        {
          "type": "Cite",
          "target": "Thm112assumption3",
          "content": [
            "3"
          ]
        },
        " one can guarantee that after ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p3.m1\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
          "meta": {
            "altText": "k"
          }
        },
        " iterations of method (",
        {
          "type": "Cite",
          "target": "S4-E14",
          "content": [
            "14"
          ]
        },
        ") one has that (here and below we omit numerical constants in the exponential multiplier)"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E16",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E16.m1\" alttext=\"\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert^{2}_{2}=\\tilde{\\mathcal{O}}\\biggl(R_{0}^{2}\\exp\\biggl(-\\frac{\\mu k}{L}\\biggr)+\\frac{\\sigma^{2}}{\\mu^{2}k}\\biggr).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">=</mml:mo><mml:mrow><mml:mover accent=\"true\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mo>−</mml:mo><mml:mfrac><mml:mrow><mml:mi>μ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:msup><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert^{2}_{2}=\\tilde{\\mathcal{O}}\\biggl(R_{0}^{2}\\exp\\biggl(-\\frac{\\mu k}{L}\\biggr)+\\frac{\\sigma^{2}}{\\mu^{2}k}\\biggr)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "Also in [",
        {
          "type": "Cite",
          "target": "bib-bib17",
          "content": [
            "17"
          ]
        },
        "], the authors obtained lower complexity bounds for solving VIs satisfying Assumptions ",
        {
          "type": "Cite",
          "target": "Thm112assumption1",
          "content": [
            "1"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "Thm112assumption2",
          "content": [
            "2"
          ]
        },
        " and ",
        {
          "type": "Cite",
          "target": "Thm112assumption3",
          "content": [
            "3"
          ]
        },
        " with stochastic methods.\nIt turns out that the conclusions of Theorem ",
        {
          "type": "Cite",
          "target": "Thm112theorem4",
          "content": [
            "4"
          ]
        },
        " in the monotone case and estimate (",
        {
          "type": "Cite",
          "target": "S4-E16",
          "content": [
            "16"
          ]
        },
        ") are optimal and meet lower bounds up to numerical constants."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "Optimistic-like (or single-call) methods were also investigated in the stochastic setting.\nThe work [",
        {
          "type": "Cite",
          "target": "bib-bib41",
          "content": [
            "41"
          ]
        },
        "] applies the following update scheme:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E17X",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E17X.m1\" alttext=\"\\displaystyle z^{k+1/2}=P_{\\mathcal{Z}}(z^{k}-\\gamma F_{\\xi^{k-1/2}}(z^{k-1/2})),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>−</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>−</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1/2}=P_{\\mathcal{Z}}(z^{k}-\\gamma F_{\\xi^{k-1/2}}(z^{k-1/2})),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S4.E17Xa",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E17Xa.m1\" alttext=\"\\displaystyle z^{k+1}=P_{\\mathcal{Z}}(z^{k}-\\gamma F_{\\xi^{k+1/2}}(z^{k+1/2})).\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1}=P_{\\mathcal{Z}}(z^{k}-\\gamma F_{\\xi^{k+1/2}}(z^{k+1/2}))."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "For this method, in the monotone Euclidean case, the authors proved an estimate similar to (",
        {
          "type": "Cite",
          "target": "S4-E15",
          "content": [
            "15"
          ]
        },
        ").\nIn the strongly monotone case, method (",
        {
          "type": "Cite",
          "target": "S4-E17",
          "content": [
            "17"
          ]
        },
        ") was investigated in the paper [",
        {
          "type": "Cite",
          "target": "bib-bib54",
          "content": [
            "54"
          ]
        },
        "], but the estimates obtained there do not meet the lower bounds.\nThe optimal estimates for this scheme were obtained later in [",
        {
          "type": "Cite",
          "target": "bib-bib14",
          "content": [
            "14"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "The work [",
        {
          "type": "Cite",
          "target": "bib-bib66",
          "content": [
            "66"
          ]
        },
        "] deals with a slightly different single-call approach in the non-Euclidean case:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E18",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E18.m1\" alttext=\"z^{k+1}=\\operatorname{prox}_{z^{k}}\\bigl(\\gamma_{k}F_{\\xi^{k}}(z^{k})+\\gamma_{k}\\alpha_{k}[F_{\\xi^{k}}(z^{k})-F_{\\xi^{k-1}}(z^{k-1})]\\bigr).\" display=\"block\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>prox</mml:mi><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"120%\" minsize=\"120%\">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>γ</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>γ</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:msub><mml:mi>α</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo maxsize=\"120%\" minsize=\"120%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "z^{k+1}=\\operatorname{prox}_{z^{k}}\\bigl(\\gamma_{k}F_{\\xi^{k}}(z^{k})+\\gamma_{k}\\alpha_{k}[F_{\\xi^{k}}(z^{k})-F_{\\xi^{k-1}}(z^{k-1})]\\bigr)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "This update rule is a modification of the Forward-Reflected-Backward approach, namely, here ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p5.m1\" alttext=\"\\alpha_{k}\" display=\"inline\"><mml:msub><mml:mi>α</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "\\alpha_{k}"
          }
        },
        " is a parameter, while in [",
        {
          "type": "Cite",
          "target": "bib-bib83",
          "content": [
            "83"
          ]
        },
        "], ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p5.m2\" alttext=\"\\alpha_{k}\\equiv 1\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>≡</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\alpha_{k}\\equiv 1"
          }
        },
        ".\nThe analysis of method (",
        {
          "type": "Cite",
          "target": "S4-E18",
          "content": [
            "18"
          ]
        },
        ") gives optimal estimates in both the strongly monotone and monotone cases."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "The theoretical results and guarantees discussed above rely in significant manner on the bounded variance assumption (Assumption ",
        {
          "type": "Cite",
          "target": "Thm112assumption3",
          "content": [
            "3"
          ]
        },
        ").\nThis assumption is quite restrictive (especially when the domain is unbounded) and does not hold for many popular machine learning problems.\nMoreover, one can even design a strongly monotone variational inequality on an unbounded domain such that method (",
        {
          "type": "Cite",
          "target": "S4-E14",
          "content": [
            "14"
          ]
        },
        ") ",
        {
          "type": "Emphasis",
          "content": [
            "diverges"
          ]
        },
        " exponentially fast [",
        {
          "type": "Cite",
          "target": "bib-bib26",
          "content": [
            "26"
          ]
        },
        "].\nThe authors of [",
        {
          "type": "Cite",
          "target": "bib-bib55",
          "content": [
            "55"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib48",
          "content": [
            "48"
          ]
        },
        "] consider a relaxed form of the bounded variance condition and assume that ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p6.m1\" alttext=\"\\mathbb{E}\\lVert F_{\\xi}(u)-F(u)\\rVert^{2}_{2}\\leq\\sigma^{2}+\\delta\\lVert u-z^{*}\\rVert_{2}^{2}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">≤</mml:mo><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mrow><mml:mi>δ</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathbb{E}\\lVert F_{\\xi}(u)-F(u)\\rVert^{2}_{2}\\leq\\sigma^{2}+\\delta\\lVert u-z^{*}\\rVert_{2}^{2}"
          }
        },
        " with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p6.m2\" alttext=\"\\delta\\geq 0\" display=\"inline\"><mml:mrow><mml:mi>δ</mml:mi><mml:mo>≥</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\delta\\geq 0"
          }
        },
        " in the Euclidean case.\nUnder this condition and Assumptions ",
        {
          "type": "Cite",
          "target": "Thm112assumption1",
          "content": [
            "1"
          ]
        },
        " and ",
        {
          "type": "Cite",
          "target": "Thm112assumption2",
          "content": [
            "2"
          ]
        },
        ", it is proved in [",
        {
          "type": "Cite",
          "target": "bib-bib48",
          "content": [
            "48"
          ]
        },
        "] that after ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p6.m3\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
          "meta": {
            "altText": "k"
          }
        },
        " iterations of algorithm (",
        {
          "type": "Cite",
          "target": "S4-E14",
          "content": [
            "14"
          ]
        },
        ") (when ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p6.m4\" alttext=\"\\mathcal{Z}=\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}=\\mathbb{R}^{d}"
          }
        },
        ") it holds that"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E19",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E19.m1\" alttext=\"\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert^{2}_{2}=\\mathcal{O}\\biggl(\\kappa R^{2}_{0}\\exp\\biggl(-\\frac{k}{\\kappa}\\biggr)+\\frac{\\sigma^{2}}{\\mu^{2}k}\\biggr),\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mrow><mml:mi>κ</mml:mi><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mo>−</mml:mo><mml:mfrac><mml:mi>k</mml:mi><mml:mi>κ</mml:mi></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:msup><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert^{2}_{2}=\\mathcal{O}\\biggl(\\kappa R^{2}_{0}\\exp\\biggl(-\\frac{k}{\\kappa}\\biggr)+\\frac{\\sigma^{2}}{\\mu^{2}k}\\biggr),"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p6.m5\" alttext=\"\\kappa=\\max\\bigl\\{\\frac{\\delta}{\\mu^{2}};\\frac{L+\\sqrt{\\delta}}{\\mu}\\bigr\\}\" display=\"inline\"><mml:mrow><mml:mi>κ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mi>max</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"120%\" minsize=\"120%\">{</mml:mo><mml:mfrac><mml:mi>δ</mml:mi><mml:msup><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mfrac><mml:mo>;</mml:mo><mml:mfrac><mml:mrow><mml:mi>L</mml:mi><mml:mo>+</mml:mo><mml:msqrt><mml:mi>δ</mml:mi></mml:msqrt></mml:mrow><mml:mi>μ</mml:mi></mml:mfrac><mml:mo maxsize=\"120%\" minsize=\"120%\">}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\kappa=\\max\\bigl\\{\\frac{\\delta}{\\mu^{2}};\\frac{L+\\sqrt{\\delta}}{\\mu}\\bigr\\}"
          }
        },
        ".\nThe same assumption on stochastic realizations was considered in [",
        {
          "type": "Cite",
          "target": "bib-bib67",
          "content": [
            "67"
          ]
        },
        "], where method (",
        {
          "type": "Cite",
          "target": "S4-E18",
          "content": [
            "18"
          ]
        },
        ") was used, yielding the estimate"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E20",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E20.m1\" alttext=\"\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert^{2}_{2}=\\mathcal{O}\\biggl(R^{2}_{0}\\exp\\biggl(-\\frac{\\mu k}{L}\\biggr)+\\frac{\\sigma^{2}+\\delta^{2}D^{2}_{\\mathcal{Z}}}{\\mu^{2}k}\\biggr).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mo>−</mml:mo><mml:mfrac><mml:mrow><mml:mi>μ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mrow><mml:msup><mml:mi>δ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>D</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow><mml:mrow><mml:msup><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert^{2}_{2}=\\mathcal{O}\\biggl(R^{2}_{0}\\exp\\biggl(-\\frac{\\mu k}{L}\\biggr)+\\frac{\\sigma^{2}+\\delta^{2}D^{2}_{\\mathcal{Z}}}{\\mu^{2}k}\\biggr)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "Estimates (",
        {
          "type": "Cite",
          "target": "S4-E19",
          "content": [
            "19"
          ]
        },
        ") and (",
        {
          "type": "Cite",
          "target": "S4-E20",
          "content": [
            "20"
          ]
        },
        ") are competitive: the former is superior in terms of the stochastic term (second term), while the latter is superior in terms of the deterministic term (first term).\nHowever, none of these results deals completely with the issue of bounded noise, because the condition considered above is not general.\nThe key to avoiding the bounded variance assumption on ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p6.m6\" alttext=\"F_{\\xi}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "F_{\\xi}"
          }
        },
        " lies in the way how stochasticity is generated in method (",
        {
          "type": "Cite",
          "target": "S4-E14",
          "content": [
            "14"
          ]
        },
        ").\nMethod (",
        {
          "type": "Cite",
          "target": "S4-E14",
          "content": [
            "14"
          ]
        },
        ") is sometimes called Independent Samples Stochastic Extragradient (I-SEG).\nTo address the bounded variance issue, K. Mishchenko et al. [",
        {
          "type": "Cite",
          "target": "bib-bib86",
          "content": [
            "86"
          ]
        },
        "] proposed another stochastic modification of the Extragradient algorithm, called Same Sample Stochastic Extragradient (S-SEG):"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex8X",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex8X.m1\" alttext=\"\\displaystyle z^{k+1/2}=z^{k}-\\gamma F_{\\xi^{k}}(z^{k}),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1/2}=z^{k}-\\gamma F_{\\xi^{k}}(z^{k}),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex8Xa",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex8Xa.m1\" alttext=\"\\displaystyle z^{k+1}=z^{k}-\\gamma F_{\\xi^{k}}(z^{k+1/2}).\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1}=z^{k}-\\gamma F_{\\xi^{k}}(z^{k+1/2})."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "For simplicity, we present the above method for the case when ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p6.m7\" alttext=\"\\mathcal{Z}=\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}=\\mathbb{R}^{d}"
          }
        },
        " (",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p6.m8\" alttext=\"F(x^{*})=0\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "F(x^{*})=0"
          }
        },
        "), and refer the reader to [",
        {
          "type": "Cite",
          "target": "bib-bib86",
          "content": [
            "86"
          ]
        },
        "] for a more general case of regularized VIs.\nIn contrast to I-SEG, S-SEG uses the same sample ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p6.m9\" alttext=\"\\xi^{k}\" display=\"inline\"><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:math>",
          "meta": {
            "altText": "\\xi^{k}"
          }
        },
        " for both steps at iteration ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p6.m10\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
          "meta": {
            "altText": "k"
          }
        },
        ".\nAlthough such a strategy cannot be implemented in some scenarios (streaming oracle), it can be applied to finite-sum problems, which have been gaining an increasing attention in the recent years.\nMoreover, S-SEG relies in significant manner on the following assumption."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112assumption4",
      "label": "Assumption 4.",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Assumption 4"
          ]
        },
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "id": "Thm112assumption4.p1",
          "content": [
            "The operator ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption4.p1.m1\" alttext=\"F_{\\xi}(z)\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "F_{\\xi}(z)"
              }
            },
            " is ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption4.p1.m2\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
              "meta": {
                "altText": "L"
              }
            },
            "-Lipschitz and ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption4.p1.m3\" alttext=\"\\mu\" display=\"inline\"><mml:mi>μ</mml:mi></mml:math>",
              "meta": {
                "altText": "\\mu"
              }
            },
            "-strongly monotone almost surely in ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption4.p1.m4\" alttext=\"\\xi\" display=\"inline\"><mml:mi>ξ</mml:mi></mml:math>",
              "meta": {
                "altText": "\\xi"
              }
            },
            ", i.e., ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption4.p1.m5\" alttext=\"\\lVert F_{\\xi}(z)-F_{\\xi}(z^{\\prime})\\rVert_{2}\\leq L\\lVert z-z^{\\prime}\\rVert_{2}\" display=\"inline\"><mml:mrow><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:mo>≤</mml:mo><mml:mrow><mml:mi>L</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\lVert F_{\\xi}(z)-F_{\\xi}(z^{\\prime})\\rVert_{2}\\leq L\\lVert z-z^{\\prime}\\rVert_{2}"
              }
            },
            " and ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption4.p1.m6\" alttext=\"\\langle F_{\\xi}(z)-F_{\\xi}(z^{\\prime}),z-z^{\\prime}\\rangle\\geq\\mu\\lVert z-z^{\\prime}\\rVert_{2}^{2}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow><mml:mo>≥</mml:mo><mml:mrow><mml:mi>μ</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mi>z</mml:mi><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\langle F_{\\xi}(z)-F_{\\xi}(z^{\\prime}),z-z^{\\prime}\\rangle\\geq\\mu\\lVert z-z^{\\prime}\\rVert_{2}^{2}"
              }
            },
            " for all ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption4.p1.m7\" alttext=\"z,z^{\\prime}\\in\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math>",
              "meta": {
                "altText": "z,z^{\\prime}\\in\\mathbb{R}^{d}"
              }
            },
            ", almost surely in ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption4.p1.m8\" alttext=\"\\xi\" display=\"inline\"><mml:mi>ξ</mml:mi></mml:math>",
              "meta": {
                "altText": "\\xi"
              }
            },
            "."
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "The evident difference between the I-SEG and S-SEG setups can be explained through the connection between the Extragradient and the Proximal Point (PP) methods [",
        {
          "type": "Cite",
          "target": "bib-bib84",
          "content": [
            "84"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib107",
          "content": [
            "107"
          ]
        },
        "].\nIn the rest of this subs-section we assume that ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m1\" alttext=\"\\mathcal{Z}=\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}=\\mathbb{R}^{d}"
          }
        },
        " (",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m2\" alttext=\"F(z^{*})=0\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "F(z^{*})=0"
          }
        },
        ").\nIn this setup, PP has the update rule"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex9",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex9.m1\" alttext=\"z^{k+1}=z^{k}-\\gamma F(z^{k+1}).\" display=\"block\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "z^{k+1}=z^{k}-\\gamma F(z^{k+1})."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "The method converges for any monotone operator ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m3\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " and any ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m4\" alttext=\"\\gamma>0\" display=\"inline\"><mml:mrow><mml:mi>γ</mml:mi><mml:mo>&gt;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\gamma>0"
          }
        },
        ".\nHowever, the update rule of PP is implicit and in many situations it cannot be computed efficiently.\nThe Extragradient method can be seen as a natural approximation of PP that substitutes ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m5\" alttext=\"z^{k+1}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "z^{k+1}"
          }
        },
        " in the right-hand side by one gradient step from ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m6\" alttext=\"z^{k}\" display=\"inline\"><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:math>",
          "meta": {
            "altText": "z^{k}"
          }
        },
        ":"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex10",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex10.m1\" alttext=\"z^{k+1}=z^{k}-\\gamma F(z^{k}-\\gamma F(z^{k})).\" display=\"block\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "z^{k+1}=z^{k}-\\gamma F(z^{k}-\\gamma F(z^{k}))."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "In addition, when ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m7\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " is ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m8\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
          "meta": {
            "altText": "L"
          }
        },
        "-Lipschitz, one can estimate how good the approximation is.\nConsider ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m9\" alttext=\"z^{k+1}=z^{k}-\\gamma F(z^{k}-\\gamma F(z^{k}))\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "z^{k+1}=z^{k}-\\gamma F(z^{k}-\\gamma F(z^{k}))"
          }
        },
        " (the Extragradient step) and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m10\" alttext=\"\\tilde{z}^{k+1}=z^{k}-\\gamma F(\\tilde{z}^{k+1})\" display=\"inline\"><mml:mrow><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\tilde{z}^{k+1}=z^{k}-\\gamma F(\\tilde{z}^{k+1})"
          }
        },
        " (the PP step).\nThen ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m11\" alttext=\"\\lVert z^{k+1}-\\tilde{z}^{k+1}\\rVert_{2}\" display=\"inline\"><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:math>",
          "meta": {
            "altText": "\\lVert z^{k+1}-\\tilde{z}^{k+1}\\rVert_{2}"
          }
        },
        " can be estimated as follows [",
        {
          "type": "Cite",
          "target": "bib-bib86",
          "content": [
            "86"
          ]
        },
        "]:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex11",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex11.m1\" alttext=\"\\begin{split}&\\lVert z^{k+1}-\\tilde{z}^{k+1}\\rVert_{2}=\\gamma\\lVert F(z^{k}-\\gamma F(z^{k}))-F(\\tilde{z}^{k+1})\\rVert_{2}\\\\\n&\\qquad\\leq\\gamma L\\lVert z^{k}-\\gamma F(z^{k})-\\tilde{z}^{k+1}\\rVert_{2}=\\gamma^{2}L\\lVert F(z^{k})-F(\\tilde{z}^{k+1})\\rVert_{2}\\\\\n&\\qquad\\leq\\gamma^{2}L^{2}\\lVert z^{k}-\\tilde{z}^{k+1}\\rVert_{2}=\\gamma^{3}L^{2}\\lVert F(\\tilde{z}^{k+1})\\rVert_{2}\\\\\n&\\qquad\\leq\\gamma^{3}L^{3}\\lVert\\tilde{z}^{k+1}-z^{*}\\rVert_{2}.\\end{split}\" display=\"block\"><mml:mtable columnspacing=\"0pt\" displaystyle=\"true\" rowspacing=\"0pt\"><mml:mtr><mml:mtd/><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:mi/><mml:mo lspace=\"2.278em\">≤</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>L</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo lspace=\"0.1389em\">=</mml:mo><mml:mrow><mml:msup><mml:mi>γ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:mi>L</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:mi/><mml:mo lspace=\"2.278em\">≤</mml:mo><mml:mrow><mml:msup><mml:mi>γ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:msup><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo lspace=\"0.1389em\">=</mml:mo><mml:mrow><mml:msup><mml:mi>γ</mml:mi><mml:mn>3</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:msup><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:mrow><mml:mi/><mml:mo lspace=\"2.278em\">≤</mml:mo><mml:mrow><mml:msup><mml:mi>γ</mml:mi><mml:mn>3</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:msup><mml:mi>L</mml:mi><mml:mn>3</mml:mn></mml:msup><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math>",
      "meta": {
        "altText": "\\begin{split}&\\lVert z^{k+1}-\\tilde{z}^{k+1}\\rVert_{2}=\\gamma\\lVert F(z^{k}-\\gamma F(z^{k}))-F(\\tilde{z}^{k+1})\\rVert_{2}\\\\\n&\\qquad\\leq\\gamma L\\lVert z^{k}-\\gamma F(z^{k})-\\tilde{z}^{k+1}\\rVert_{2}=\\gamma^{2}L\\lVert F(z^{k})-F(\\tilde{z}^{k+1})\\rVert_{2}\\\\\n&\\qquad\\leq\\gamma^{2}L^{2}\\lVert z^{k}-\\tilde{z}^{k+1}\\rVert_{2}=\\gamma^{3}L^{2}\\lVert F(\\tilde{z}^{k+1})\\rVert_{2}\\\\\n&\\qquad\\leq\\gamma^{3}L^{3}\\lVert\\tilde{z}^{k+1}-z^{*}\\rVert_{2}.\\end{split}"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "That is, the difference between the Extragradient and PP steps is of the order ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m12\" alttext=\"\\mathcal{O}(\\gamma^{3})\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>γ</mml:mi><mml:mn>3</mml:mn></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathcal{O}(\\gamma^{3})"
          }
        },
        " rather than ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m13\" alttext=\"\\mathcal{O}(\\gamma^{2})\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>γ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathcal{O}(\\gamma^{2})"
          }
        },
        ".\nSince the latter corresponds to the difference between PP and the simple gradient step (",
        {
          "type": "Cite",
          "target": "S3-E7",
          "content": [
            "7"
          ]
        },
        "), the Extragradient method approximates PP better than gradient steps, which are known to be non-convergent for general monotone Lipschitz variational inequalities.\nThis approximation feature of the Extragradient method is crucial for its convergence and, as the above derivation implies, the approximation argument significantly relies on the Lipschitzness of the operator ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p7.m14\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        "."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S4.SS1.p8",
      "content": [
        "Let us go back to the differences between I-SEG and S-SEG.\nIn S-SEG, the ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p8.m1\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
          "meta": {
            "altText": "k"
          }
        },
        "-th iteration can be regarded as a single Extragradient step for the operator ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p8.m2\" alttext=\"F_{\\xi^{k}}(z)\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F_{\\xi^{k}}(z)"
          }
        },
        ".\nTherefore, Lipschitzness and monotonicity of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p8.m3\" alttext=\"F_{\\xi^{k}}(z)\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F_{\\xi^{k}}(z)"
          }
        },
        " (Assumption ",
        {
          "type": "Cite",
          "target": "Thm112assumption4",
          "content": [
            "4"
          ]
        },
        ") are important for the analysis of S-SEG.\nIn contrast, I-SEG uses different operators for the extrapolation and update steps.\nIn this case, there is no effect from the Lipschitzness/monotonicity of individual ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p8.m4\" alttext=\"F_{\\xi}(z)\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F_{\\xi}(z)"
          }
        },
        "s.\nTherefore, the analysis of I-SEG naturally relies on the Lipschitzness and monotonicity of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p8.m5\" alttext=\"F(z)\" display=\"inline\"><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F(z)"
          }
        },
        " as well as on the closeness (on average) of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p8.m6\" alttext=\"F_{\\xi}(z)\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F_{\\xi}(z)"
          }
        },
        " and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p8.m7\" alttext=\"F(z)\" display=\"inline\"><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F(z)"
          }
        },
        " (Assumption ",
        {
          "type": "Cite",
          "target": "Thm112assumption3",
          "content": [
            "3"
          ]
        },
        ")."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S4.SS1.p9",
      "content": [
        "The convergence of I-SEG was discussed earlier in this section.\nRegarding S-SEG, one has the following result [",
        {
          "type": "Cite",
          "target": "bib-bib86",
          "content": [
            "86"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112theorem5",
      "claimType": "Theorem",
      "label": "Theorem 5.",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Theorem 5"
          ]
        },
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "content": [
            {
              "type": "Emphasis",
              "content": [
                "Let Assumption ",
                {
                  "type": "Cite",
                  "target": "Thm112assumption4",
                  "content": [
                    "4"
                  ]
                },
                " hold.\nThen there exists a choice of step size ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem5.p1.m1\" alttext=\"\\gamma\" display=\"inline\"><mml:mi>γ</mml:mi></mml:math>",
                  "meta": {
                    "altText": "\\gamma"
                  }
                },
                " (see [",
                {
                  "type": "Cite",
                  "target": "bib-bib48",
                  "content": [
                    "48"
                  ]
                },
                "]) such that the output of S-SEG after ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem5.p1.m2\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
                  "meta": {
                    "altText": "k"
                  }
                },
                " iterations satisfies\n"
              ]
            }
          ]
        },
        {
          "type": "MathBlock",
          "id": "S4.Ex12",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex12.m1\" alttext=\"\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\mathcal{O}\\biggl(\\frac{LR^{2}_{0}}{\\mu}\\exp\\biggl(-\\frac{\\mu k}{L}\\biggr)+\\frac{\\sigma_{*}^{2}}{\\mu^{2}k}\\biggr),\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mi>L</mml:mi><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mi>μ</mml:mi></mml:mfrac><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mo>−</mml:mo><mml:mfrac><mml:mrow><mml:mi>μ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:msubsup><mml:mi>σ</mml:mi><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup><mml:mrow><mml:msup><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\mathcal{O}\\biggl(\\frac{LR^{2}_{0}}{\\mu}\\exp\\biggl(-\\frac{\\mu k}{L}\\biggr)+\\frac{\\sigma_{*}^{2}}{\\mu^{2}k}\\biggr),"
          }
        },
        {
          "type": "Paragraph",
          "content": [
            {
              "type": "Emphasis",
              "content": [
                "where ",
                {
                  "type": "MathFragment",
                  "mathLanguage": "mathml",
                  "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112theorem5.p1.m3\" alttext=\"\\sigma_{*}^{2}=\\mathbb{E}\\lVert F_{\\xi}(z^{*})\\rVert_{2}^{2}\" display=\"inline\"><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mo mathvariant=\"normal\">*</mml:mo><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msubsup><mml:mo mathvariant=\"normal\">=</mml:mo><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\" mathvariant=\"italic\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" mathvariant=\"normal\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo mathvariant=\"italic\">⁢</mml:mo><mml:mrow><mml:mo mathvariant=\"normal\" stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo mathvariant=\"normal\">*</mml:mo></mml:msup><mml:mo mathvariant=\"normal\" stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" mathvariant=\"normal\">∥</mml:mo></mml:mrow><mml:mn mathvariant=\"normal\">2</mml:mn><mml:mn mathvariant=\"normal\">2</mml:mn></mml:msubsup></mml:mrow></mml:mrow></mml:math>",
                  "meta": {
                    "altText": "\\sigma_{*}^{2}=\\mathbb{E}\\lVert F_{\\xi}(z^{*})\\rVert_{2}^{2}"
                  }
                },
                "."
              ]
            }
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "This rate is similar to the one known for I-SEG, with the following differences.\nFirst, instead of the uniform bound on the variance ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m1\" alttext=\"\\sigma^{2}\" display=\"inline\"><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math>",
          "meta": {
            "altText": "\\sigma^{2}"
          }
        },
        ", the rate depends on ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m2\" alttext=\"\\sigma_{*}^{2}\" display=\"inline\"><mml:msubsup><mml:mi>σ</mml:mi><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup></mml:math>",
          "meta": {
            "altText": "\\sigma_{*}^{2}"
          }
        },
        ", which is the variance of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m3\" alttext=\"F_{\\xi}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "F_{\\xi}"
          }
        },
        " measured at the solution.\nIn many cases, ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m4\" alttext=\"\\sigma^{2}=\\infty\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mi mathvariant=\"normal\">∞</mml:mi></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\sigma^{2}=\\infty"
          }
        },
        ", while ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m5\" alttext=\"\\sigma_{*}^{2}\" display=\"inline\"><mml:msubsup><mml:mi>σ</mml:mi><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup></mml:math>",
          "meta": {
            "altText": "\\sigma_{*}^{2}"
          }
        },
        " is finite.\nFrom this perspective, S-SEG enjoys a better rate of convergence than I-SEG.\nHowever, this comes at a price: while the rate of I-SEG depends on the Lipschitz and strong-monotonicity constants of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m6\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        ", the rate of S-SEG depends on ",
        {
          "type": "Emphasis",
          "content": [
            "the worst"
          ]
        },
        " constants of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m7\" alttext=\"F_{\\xi}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "F_{\\xi}"
          }
        },
        ", which can be much worse than those for ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m8\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        ".\nIn particular, consider the finite-sum setup with uniform sampling of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m9\" alttext=\"\\xi\" display=\"inline\"><mml:mi>ξ</mml:mi></mml:math>",
          "meta": {
            "altText": "\\xi"
          }
        },
        ": ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m10\" alttext=\"F(x)=\\frac{1}{n}\\sum_{i=1}^{n}F_{i}(x)\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>⁢</mml:mo><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F(x)=\\frac{1}{n}\\sum_{i=1}^{n}F_{i}(x)"
          }
        },
        ", where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m11\" alttext=\"F_{i}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "F_{i}"
          }
        },
        " is ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m12\" alttext=\"L_{i}\" display=\"inline\"><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "L_{i}"
          }
        },
        "-Lipschitz and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m13\" alttext=\"\\mu_{i}\" display=\"inline\"><mml:msub><mml:mi>μ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "\\mu_{i}"
          }
        },
        "-strongly monotone, and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m14\" alttext=\"\\mathbb{P}\\{\\xi=i\\}=\\frac{1}{n}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>ℙ</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">{</mml:mo><mml:mrow><mml:mi>ξ</mml:mi><mml:mo>=</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy=\"false\">}</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathbb{P}\\{\\xi=i\\}=\\frac{1}{n}"
          }
        },
        ".\nThen Assumption ",
        {
          "type": "Cite",
          "target": "Thm112assumption4",
          "content": [
            "4"
          ]
        },
        " holds with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m15\" alttext=\"L=\\max_{1\\leq i\\leq n}L_{i}\" display=\"inline\"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>max</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>≤</mml:mo><mml:mi>i</mml:mi><mml:mo>≤</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "L=\\max_{1\\leq i\\leq n}L_{i}"
          }
        },
        " and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m16\" alttext=\"\\mu=\\min_{1\\leq i\\leq n}\\mu_{i}\" display=\"inline\"><mml:mrow><mml:mi>μ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>min</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>≤</mml:mo><mml:mi>i</mml:mi><mml:mo>≤</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:msub><mml:mi>μ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mu=\\min_{1\\leq i\\leq n}\\mu_{i}"
          }
        },
        " and these constants appear in the rate from Theorem ",
        {
          "type": "Cite",
          "target": "Thm112assumption3",
          "content": [
            "3"
          ]
        },
        ".\nThe authors of [",
        {
          "type": "Cite",
          "target": "bib-bib48",
          "content": [
            "48"
          ]
        },
        "] tighten this rate.\nIn particular, they prove that for S-SEG with different step sizes for the extrapolation and update steps one has that"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex13",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex13.m1\" alttext=\"\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\mathcal{O}\\biggl(\\frac{LR^{2}_{0}}{\\overline{\\mu}}\\exp\\biggl(-\\frac{\\overline{\\mu}k}{L}\\biggr)+\\frac{\\sigma_{*}^{2}}{\\overline{\\mu}^{2}k}\\biggr),\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mi>L</mml:mi><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mover accent=\"true\"><mml:mi>μ</mml:mi><mml:mo>¯</mml:mo></mml:mover></mml:mfrac><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mo>−</mml:mo><mml:mfrac><mml:mrow><mml:mover accent=\"true\"><mml:mi>μ</mml:mi><mml:mo>¯</mml:mo></mml:mover><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:msubsup><mml:mi>σ</mml:mi><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup><mml:mrow><mml:msup><mml:mover accent=\"true\"><mml:mi>μ</mml:mi><mml:mo>¯</mml:mo></mml:mover><mml:mn>2</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\mathcal{O}\\biggl(\\frac{LR^{2}_{0}}{\\overline{\\mu}}\\exp\\biggl(-\\frac{\\overline{\\mu}k}{L}\\biggr)+\\frac{\\sigma_{*}^{2}}{\\overline{\\mu}^{2}k}\\biggr),"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m17\" alttext=\"\\sigma_{*}^{2}=\\frac{1}{n}\\sum_{i=1}^{n}\\lVert F_{i}(z^{*})\\rVert_{2}^{2}\" display=\"inline\"><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>⁢</mml:mo><mml:mrow><mml:msubsup><mml:mo rspace=\"0em\">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msubsup><mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\sigma_{*}^{2}=\\frac{1}{n}\\sum_{i=1}^{n}\\lVert F_{i}(z^{*})\\rVert_{2}^{2}"
          }
        },
        " and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m18\" alttext=\"\\overline{\\mu}=\\frac{1}{n}\\sum_{i=1}^{n}\\mu_{i}\" display=\"inline\"><mml:mrow><mml:mover accent=\"true\"><mml:mi>μ</mml:mi><mml:mo>¯</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>⁢</mml:mo><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msub><mml:mi>μ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\overline{\\mu}=\\frac{1}{n}\\sum_{i=1}^{n}\\mu_{i}"
          }
        },
        ".\nSince ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m19\" alttext=\"\\overline{\\mu}\" display=\"inline\"><mml:mover accent=\"true\"><mml:mi>μ</mml:mi><mml:mo>¯</mml:mo></mml:mover></mml:math>",
          "meta": {
            "altText": "\\overline{\\mu}"
          }
        },
        " is (sometimes considerably) larger than ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m20\" alttext=\"\\mu\" display=\"inline\"><mml:mi>μ</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mu"
          }
        },
        ", the improvement is noticeable.\nMoreover, when the constants ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m21\" alttext=\"\\{L_{i}\\}_{i=1}^{n}\" display=\"inline\"><mml:msubsup><mml:mrow><mml:mo stretchy=\"false\">{</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy=\"false\">}</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup></mml:math>",
          "meta": {
            "altText": "\\{L_{i}\\}_{i=1}^{n}"
          }
        },
        " are known, one can consider the so-called ",
        {
          "type": "Emphasis",
          "content": [
            "importance sampling"
          ]
        },
        " [",
        {
          "type": "Cite",
          "target": "bib-bib52",
          "content": [
            "52"
          ]
        },
        "]: ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m22\" alttext=\"\\mathbb{P}\\{\\xi=i\\}={L_{i}}/{(n\\overline{L})}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>ℙ</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">{</mml:mo><mml:mrow><mml:mi>ξ</mml:mi><mml:mo>=</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy=\"false\">}</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>⁢</mml:mo><mml:mover accent=\"true\"><mml:mi>L</mml:mi><mml:mo>¯</mml:mo></mml:mover></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathbb{P}\\{\\xi=i\\}={L_{i}}/{(n\\overline{L})}"
          }
        },
        ", where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m23\" alttext=\"\\overline{L}=\\frac{1}{n}\\sum_{i=1}^{n}L_{i}\" display=\"inline\"><mml:mrow><mml:mover accent=\"true\"><mml:mi>L</mml:mi><mml:mo>¯</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>⁢</mml:mo><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\overline{L}=\\frac{1}{n}\\sum_{i=1}^{n}L_{i}"
          }
        },
        ".\nAs the authors of [",
        {
          "type": "Cite",
          "target": "bib-bib48",
          "content": [
            "48"
          ]
        },
        "] show, importance sampling can be combined with S-SEG by allowing the extrapolation and update step sizes at the ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m24\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
          "meta": {
            "altText": "k"
          }
        },
        "-th iteration to depend on the sample ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m25\" alttext=\"\\xi^{k}\" display=\"inline\"><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:math>",
          "meta": {
            "altText": "\\xi^{k}"
          }
        },
        ".\nIn particular, for the proposed modification of S-SEG they derive the estimate"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex14",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex14.m1\" alttext=\"\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\mathcal{O}\\biggl(\\frac{\\overline{L}R_{0}^{2}}{\\overline{\\mu}}\\exp\\biggl(-\\frac{\\overline{\\mu}k}{\\overline{L}}\\biggr)+\\frac{\\hat{\\sigma}_{*}^{2}}{\\overline{\\mu}^{2}k}\\biggr),\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mover accent=\"true\"><mml:mi>L</mml:mi><mml:mo>¯</mml:mo></mml:mover><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mover accent=\"true\"><mml:mi>μ</mml:mi><mml:mo>¯</mml:mo></mml:mover></mml:mfrac><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mo>−</mml:mo><mml:mfrac><mml:mrow><mml:mover accent=\"true\"><mml:mi>μ</mml:mi><mml:mo>¯</mml:mo></mml:mover><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mover accent=\"true\"><mml:mi>L</mml:mi><mml:mo>¯</mml:mo></mml:mover></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:msubsup><mml:mover accent=\"true\"><mml:mi>σ</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup><mml:mrow><mml:msup><mml:mover accent=\"true\"><mml:mi>μ</mml:mi><mml:mo>¯</mml:mo></mml:mover><mml:mn>2</mml:mn></mml:msup><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\mathcal{O}\\biggl(\\frac{\\overline{L}R_{0}^{2}}{\\overline{\\mu}}\\exp\\biggl(-\\frac{\\overline{\\mu}k}{\\overline{L}}\\biggr)+\\frac{\\hat{\\sigma}_{*}^{2}}{\\overline{\\mu}^{2}k}\\biggr),"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m26\" alttext=\"\\hat{\\sigma}_{*}^{2}=\\frac{1}{n}\\sum_{i=1}^{n}\\frac{\\overline{L}}{L_{i}}\\lVert\nF_{i}(z^{*})\\rVert_{2}^{2}\" display=\"inline\"><mml:mrow><mml:msubsup><mml:mover accent=\"true\"><mml:mi>σ</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>⁢</mml:mo><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:mfrac><mml:mover accent=\"true\"><mml:mi>L</mml:mi><mml:mo>¯</mml:mo></mml:mover><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mfrac><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\hat{\\sigma}_{*}^{2}=\\frac{1}{n}\\sum_{i=1}^{n}\\frac{\\overline{L}}{L_{i}}\\lVert\nF_{i}(z^{*})\\rVert_{2}^{2}"
          }
        },
        ".\nThe exponentially decaying term is always better than the corresponding one for S-SEG with uniform sampling.\nThis usually implies faster convergence during the initial stage.\nNext, typically, a larger norm of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m27\" alttext=\"F_{i}(z^{*})\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "F_{i}(z^{*})"
          }
        },
        " implies larger ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m28\" alttext=\"L_{i}\" display=\"inline\"><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "L_{i}"
          }
        },
        ", e.g., ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m29\" alttext=\"\\lVert F_{i}(z^{*})\\rVert_{2}^{2}\\sim L_{i}^{2}\" display=\"inline\"><mml:mrow><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>∼</mml:mo><mml:msubsup><mml:mi>L</mml:mi><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\lVert F_{i}(z^{*})\\rVert_{2}^{2}\\sim L_{i}^{2}"
          }
        },
        ".\nIn this case, ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m30\" alttext=\"\\hat{\\sigma}_{*}^{2}\\leq\\sigma_{*}^{2}\" display=\"inline\"><mml:mrow><mml:msubsup><mml:mover accent=\"true\"><mml:mi>σ</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup><mml:mo>≤</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\hat{\\sigma}_{*}^{2}\\leq\\sigma_{*}^{2}"
          }
        },
        ", because"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex15",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex15.m1\" alttext=\"\\hat{\\sigma}_{*}^{2}\\sim(\\overline{L})^{2}\\quad\\textrm{and}\\quad\\sigma_{*}^{2}\\sim\\overline{L^{2}}=\\frac{1}{n}\\sum_{i=1}^{n}L_{i}^{2}\\geq(\\overline{L})^{2}.\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:msubsup><mml:mover accent=\"true\"><mml:mi>σ</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup><mml:mo>∼</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mover accent=\"true\"><mml:mi>L</mml:mi><mml:mo>¯</mml:mo></mml:mover><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mspace width=\"1em\"/><mml:mtext>and</mml:mtext></mml:mrow></mml:mrow><mml:mspace width=\"1em\"/><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup><mml:mo>∼</mml:mo><mml:mover accent=\"true\"><mml:msup><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>¯</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>⁢</mml:mo><mml:mrow><mml:munderover><mml:mo movablelimits=\"false\">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:msubsup><mml:mi>L</mml:mi><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow><mml:mo>≥</mml:mo><mml:msup><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mover accent=\"true\"><mml:mi>L</mml:mi><mml:mo>¯</mml:mo></mml:mover><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\hat{\\sigma}_{*}^{2}\\sim(\\overline{L})^{2}\\quad\\textrm{and}\\quad\\sigma_{*}^{2}\\sim\\overline{L^{2}}=\\frac{1}{n}\\sum_{i=1}^{n}L_{i}^{2}\\geq(\\overline{L})^{2}."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "Moreover, one can allow other sampling strategies and cover the case when some ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS1.p10.m31\" alttext=\"\\mu_{i}\" display=\"inline\"><mml:msub><mml:mi>μ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "\\mu_{i}"
          }
        },
        " are negative, see [",
        {
          "type": "Cite",
          "target": "bib-bib48",
          "content": [
            "48"
          ]
        },
        "] for the details.\n"
      ]
    },
    {
      "type": "Heading",
      "id": "S4.SS2",
      "depth": 2,
      "content": [
        "4.2 Finite-sum case"
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "As noted earlier, when we deal with problem (",
        {
          "type": "Cite",
          "target": "S4-E13",
          "content": [
            "13"
          ]
        },
        "), it is often the case (especially in practical problems) that the distribution ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p1.m1\" alttext=\"\\mathcal{D}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒟</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mathcal{D}"
          }
        },
        " is unknown, but nevertheless some samples from ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p1.m2\" alttext=\"\\mathcal{D}\" display=\"inline\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒟</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mathcal{D}"
          }
        },
        " are available.\nThen one can replace problem (",
        {
          "type": "Cite",
          "target": "S4-E13",
          "content": [
            "13"
          ]
        },
        ") by a finite-sum approximation:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex16",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex16.m1\" alttext=\"F(z)=\\frac{1}{n}\\sum_{i=1}^{n}F_{i}(z).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>⁢</mml:mo><mml:mrow><mml:munderover><mml:mo movablelimits=\"false\">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "F(z)=\\frac{1}{n}\\sum_{i=1}^{n}F_{i}(z)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "This approximation is sometimes also called Monte Carlo approximation.\nFor machine learning problems the term empirical risk is often encountered.\nAlthough calls of the full operator are now tractable, they remain expensive in practice.\nTherefore, it is worth avoiding frequent computation of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p1.m3\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " and mainly use calls to single ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p1.m4\" alttext=\"F_{i}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "F_{i}"
          }
        },
        " operators or small batches of them."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S4.SS2.p2",
      "content": [
        "Before presenting the results, let us introduce the appropriate analogue of the Lipschitzness assumption."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112assumption5",
      "label": "Assumption 5(Lipschitzness in the mean).",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Assumption 5"
          ]
        },
        {
          "type": "Strong",
          "content": []
        },
        "(Lipschitzness in the mean)",
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "content": [
            "The operator ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption5.p1.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
              "meta": {
                "altText": "F"
              }
            },
            " is ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption5.p1.m2\" alttext=\"L_{\\mathrm{avg}}\" display=\"inline\"><mml:msub><mml:mi>L</mml:mi><mml:mi>avg</mml:mi></mml:msub></mml:math>",
              "meta": {
                "altText": "L_{\\mathrm{avg}}"
              }
            },
            "-Lipschitz continuous in mean, i.e., for all ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption5.p1.m3\" alttext=\"u,v\\in\\mathcal{Z}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "u,v\\in\\mathcal{Z}"
              }
            },
            ", we have"
          ]
        },
        {
          "type": "MathBlock",
          "id": "S4.Ex17",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex17.m1\" alttext=\"\\mathbb{E}[\\lVert F_{\\xi}(u)-F_{\\xi}(v)\\rVert^{2}_{*}]\\leq L^{2}_{\\mathrm{avg}}\\lVert u-v\\rVert^{2}.\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0em\">∥</mml:mo></mml:mrow><mml:mo>*</mml:mo><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow><mml:mo>≤</mml:mo><mml:mrow><mml:msubsup><mml:mi>L</mml:mi><mml:mi>avg</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>−</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathbb{E}[\\lVert F_{\\xi}(u)-F_{\\xi}(v)\\rVert^{2}_{*}]\\leq L^{2}_{\\mathrm{avg}}\\lVert u-v\\rVert^{2}."
          }
        }
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "For example, if ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p3.m1\" alttext=\"F_{i}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "F_{i}"
          }
        },
        " is ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p3.m2\" alttext=\"L_{i}\" display=\"inline\"><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "L_{i}"
          }
        },
        "-Lipschitz for all ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p3.m3\" alttext=\"i\" display=\"inline\"><mml:mi>i</mml:mi></mml:math>",
          "meta": {
            "altText": "i"
          }
        },
        " and we draw the index ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p3.m4\" alttext=\"\\xi=i\" display=\"inline\"><mml:mrow><mml:mi>ξ</mml:mi><mml:mo>=</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\xi=i"
          }
        },
        " with probability ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p3.m5\" alttext=\"p_{i}=L_{i}/{\\sum_{j}L_{j}}\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo rspace=\"0.055em\">/</mml:mo><mml:mrow><mml:msub><mml:mo>∑</mml:mo><mml:mi>j</mml:mi></mml:msub><mml:msub><mml:mi>L</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "p_{i}=L_{i}/{\\sum_{j}L_{j}}"
          }
        },
        ", then"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex18",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex18.m1\" alttext=\"L_{\\mathrm{avg}}=\\frac{1}{n}\\sum_{j}L_{j}.\" display=\"block\"><mml:mrow><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>avg</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>⁢</mml:mo><mml:mrow><mml:munder><mml:mo movablelimits=\"false\">∑</mml:mo><mml:mi>j</mml:mi></mml:munder><mml:msub><mml:mi>L</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "L_{\\mathrm{avg}}=\\frac{1}{n}\\sum_{j}L_{j}."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "The study of finite-sum problems in stochastic optimization is connected, first of all, with classical methods for minimization problems such as SVRG [",
        {
          "type": "Cite",
          "target": "bib-bib59",
          "content": [
            "59"
          ]
        },
        "] and SAGA [",
        {
          "type": "Cite",
          "target": "bib-bib29",
          "content": [
            "29"
          ]
        },
        "].\nFor the saddle point problems, these methods were adopted in [",
        {
          "type": "Cite",
          "target": "bib-bib102",
          "content": [
            "102"
          ]
        },
        "] (in fact, these results are also valid for variational inequalities).\nThe authors considered strongly convex-strongly concave saddles in the Euclidean case and proved the following estimates for SVRG and SAGA:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex19",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex19.m1\" alttext=\"\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\mathcal{O}\\biggl(R^{2}_{0}\\exp\\biggl(-\\min\\biggl\\{\\frac{1}{n},\\frac{\\mu^{2}}{L^{2}_{\\mathrm{avg}}}\\biggr\\}k\\biggr)\\biggr).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mo rspace=\"0.167em\">−</mml:mo><mml:mrow><mml:mrow><mml:mi>min</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">{</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:msup><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:msubsup><mml:mi>L</mml:mi><mml:mi>avg</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mfrac><mml:mo maxsize=\"210%\" minsize=\"210%\">}</mml:mo></mml:mrow></mml:mrow><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\mathcal{O}\\biggl(R^{2}_{0}\\exp\\biggl(-\\min\\biggl\\{\\frac{1}{n},\\frac{\\mu^{2}}{L^{2}_{\\mathrm{avg}}}\\biggr\\}k\\biggr)\\biggr)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "Since this last bound is not tight in terms of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p4.m1\" alttext=\"L_{\\mathrm{avg}}/\\mu\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>avg</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:mi>μ</mml:mi></mml:mrow></mml:math>",
          "meta": {
            "altText": "L_{\\mathrm{avg}}/\\mu"
          }
        },
        ", the authors proposed accelerating SVRG and SAGA via the Catalyst envelope [",
        {
          "type": "Cite",
          "target": "bib-bib76",
          "content": [
            "76"
          ]
        },
        "].\nIn this case, they obtain the bound"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E21",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E21.m1\" alttext=\"\\begin{split}&\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}\\\\\n&\\qquad=\\mathcal{O}\\biggl(R^{2}_{0}\\exp\\biggl(-\\min\\biggl\\{\\frac{1}{n};\\frac{\\mu}{\\sqrt{n}L_{\\mathrm{avg}}}\\biggr\\}\\frac{k}{\\log[L_{\\mathrm{avg}}/\\mu]}\\biggr)\\biggr).\\end{split}\" display=\"block\"><mml:mtable columnspacing=\"0pt\" displaystyle=\"true\" rowspacing=\"0pt\"><mml:mtr><mml:mtd/><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:mrow><mml:mi/><mml:mo lspace=\"2.278em\">=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mo rspace=\"0.167em\">−</mml:mo><mml:mrow><mml:mrow><mml:mi>min</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">{</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>;</mml:mo><mml:mfrac><mml:mi>μ</mml:mi><mml:mrow><mml:msqrt><mml:mi>n</mml:mi></mml:msqrt><mml:mo>⁢</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi>avg</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mo maxsize=\"210%\" minsize=\"210%\">}</mml:mo></mml:mrow></mml:mrow><mml:mo>⁢</mml:mo><mml:mfrac><mml:mi>k</mml:mi><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>avg</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:mi>μ</mml:mi></mml:mrow><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math>",
      "meta": {
        "altText": "\\begin{split}&\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}\\\\\n&\\qquad=\\mathcal{O}\\biggl(R^{2}_{0}\\exp\\biggl(-\\min\\biggl\\{\\frac{1}{n};\\frac{\\mu}{\\sqrt{n}L_{\\mathrm{avg}}}\\biggr\\}\\frac{k}{\\log[L_{\\mathrm{avg}}/\\mu]}\\biggr)\\biggr).\\end{split}"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "The same estimates for methods for saddle point problems based on accelerating envelopes were also presented in [",
        {
          "type": "Cite",
          "target": "bib-bib118",
          "content": [
            "118"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "An important step in the study of the finite-sum stochastic setup was taken in the work [",
        {
          "type": "Cite",
          "target": "bib-bib25",
          "content": [
            "25"
          ]
        },
        "], which is primarily focused on bilinear games.\nFor this class of problems, the authors improved estimate (",
        {
          "type": "Cite",
          "target": "S4-E21",
          "content": [
            "21"
          ]
        },
        ") and removed the additional logarithmic factor.\nFor general problems (saddle point and variational inequalities) the results of [",
        {
          "type": "Cite",
          "target": "bib-bib25",
          "content": [
            "25"
          ]
        },
        "] are very similar to those in (",
        {
          "type": "Cite",
          "target": "S4-E21",
          "content": [
            "21"
          ]
        },
        ") and also include an additional logarithmic factor.\nThe authors also considered the convex-concave/monotone case in the non-Euclidean setting and found that for their method after ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p5.m1\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
          "meta": {
            "altText": "k"
          }
        },
        " iterations it holds that"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E22",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E22.m1\" alttext=\"\\mathbb{E}[\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})]=\\tilde{\\mathcal{O}}\\biggl(\\frac{\\sqrt{n}L_{\\mathrm{avg}}D^{2}_{\\mathcal{Z},V}}{k}\\biggr).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mrow><mml:msub><mml:mi>Gap</mml:mi><mml:mi>VI</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mover accent=\"true\"><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>~</mml:mo></mml:mover><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mfrac><mml:mrow><mml:msqrt><mml:mi>n</mml:mi></mml:msqrt><mml:mo>⁢</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi>avg</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mi>k</mml:mi></mml:mfrac><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\mathbb{E}[\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})]=\\tilde{\\mathcal{O}}\\biggl(\\frac{\\sqrt{n}L_{\\mathrm{avg}}D^{2}_{\\mathcal{Z},V}}{k}\\biggr)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "The issue of the additional logarithmic factor was resolved in [",
        {
          "type": "Cite",
          "target": "bib-bib2",
          "content": [
            "2"
          ]
        },
        "], where the following modification of the Extragradient method was proposed:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E23X",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E23X.m1\" alttext=\"\\displaystyle z^{k+1/2}=P_{\\mathcal{Z}}(z^{k}+\\tau(w^{k}-z^{k})-\\gamma F(w^{k})),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mrow><mml:mi>τ</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>w</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1/2}=P_{\\mathcal{Z}}(z^{k}+\\tau(w^{k}-z^{k})-\\gamma F(w^{k})),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S4.E23Xa",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E23Xa.m1\" alttext=\"\\displaystyle\\Delta^{k}=F_{\\xi^{k}}(z^{k+1/2})-F_{\\xi^{k}}(w^{k})+F(w^{k}),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi mathvariant=\"normal\">Δ</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>w</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>w</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle\\Delta^{k}=F_{\\xi^{k}}(z^{k+1/2})-F_{\\xi^{k}}(w^{k})+F(w^{k}),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S4.E23Xb",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E23Xb.m1\" alttext=\"\\displaystyle z^{k+1}=P_{\\mathcal{Z}}(z^{k}+\\tau(w^{k}-z^{k})-\\gamma\\Delta^{k})\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mrow><mml:mi>τ</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:msup><mml:mi mathvariant=\"normal\">Δ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1}=P_{\\mathcal{Z}}(z^{k}+\\tau(w^{k}-z^{k})-\\gamma\\Delta^{k})"
      }
    },
    {
      "type": "MathBlock",
      "id": "S4.E23Xc",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E23Xc.m1\" alttext=\"\\displaystyle w^{k+1}=\\begin{cases}z^{k+1},&\\textrm{with probability}\\ p,\\\\\nw^{k},&\\textrm{with probability}\\ 1-p.\\end{cases}\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnspacing=\"5pt\" rowspacing=\"0pt\"><mml:mtr><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:mrow><mml:mtext>with probability</mml:mtext><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:mi>p</mml:mi></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:mrow><mml:mrow><mml:mtext>with probability</mml:mtext><mml:mo>⁢</mml:mo><mml:mn> 1</mml:mn></mml:mrow><mml:mo>−</mml:mo><mml:mi>p</mml:mi></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle w^{k+1}=\\begin{cases}z^{k+1},&\\textrm{with probability}\\ p,\\\\\nw^{k},&\\textrm{with probability}\\ 1-p.\\end{cases}"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "This algorithm is a combination of the extra step technique from the theory of VIs and the loopless approach [",
        {
          "type": "Cite",
          "target": "bib-bib73",
          "content": [
            "73"
          ]
        },
        "] for finite-sum problems.\nAn interesting ingredient of the method is the randomized negative momentum: ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p5.m2\" alttext=\"\\tau(w^{k}-z^{k})\" display=\"inline\"><mml:mrow><mml:mi>τ</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\tau(w^{k}-z^{k})"
          }
        },
        ".\nWhile for minimization problems it is usual to apply a positive/heavy-ball momentum, the opposite approach proves useful for saddle point problems and variational inequalities.\nThis effect was noticed earlier [",
        {
          "type": "Cite",
          "target": "bib-bib42",
          "content": [
            "42"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib122",
          "content": [
            "122"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib3",
          "content": [
            "3"
          ]
        },
        "] and is encountered now in the theory of stochastic methods for VIs.\nAlso, in [",
        {
          "type": "Cite",
          "target": "bib-bib2",
          "content": [
            "2"
          ]
        },
        "], the authors presented modifications for the Forward-Backward, Forward-Reflected-Backward as well as for the Extragradient methods in the non-Euclidean case."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "As we noted earlier, the results of [",
        {
          "type": "Cite",
          "target": "bib-bib2",
          "content": [
            "2"
          ]
        },
        "] give estimates (",
        {
          "type": "Cite",
          "target": "S4-E21",
          "content": [
            "21"
          ]
        },
        ") and (",
        {
          "type": "Cite",
          "target": "S4-E22",
          "content": [
            "22"
          ]
        },
        "), but without additional logarithmic factors.\nThat is, to achieve"
      ]
    },
    {
      "type": "MathBlock",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex20X.m1\" alttext=\"\\displaystyle\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}\\leq\\varepsilon\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">≤</mml:mo><mml:mi>ε</mml:mi></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}\\leq\\varepsilon"
      }
    },
    {
      "type": "MathBlock",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex20X.m4\" alttext=\"\\displaystyle\\textrm{in the strongly monotone case},\" display=\"inline\"><mml:mrow><mml:mtext>in the strongly monotone case</mml:mtext><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle\\textrm{in the strongly monotone case},"
      }
    },
    {
      "type": "MathBlock",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex20Xa.m1\" alttext=\"\\displaystyle\\mathbb{E}[\\operatorname{Gap}_{VI}(\\hat{z}^{k})]\\leq\\varepsilon\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mrow><mml:msub><mml:mi>Gap</mml:mi><mml:mrow><mml:mi>V</mml:mi><mml:mo>⁢</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow><mml:mo>≤</mml:mo><mml:mi>ε</mml:mi></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle\\mathbb{E}[\\operatorname{Gap}_{VI}(\\hat{z}^{k})]\\leq\\varepsilon"
      }
    },
    {
      "type": "MathBlock",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex20Xa.m4\" alttext=\"\\displaystyle\\textrm{in the monotone case},\" display=\"inline\"><mml:mrow><mml:mtext>in the monotone case</mml:mtext><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle\\textrm{in the monotone case},"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "the methods from [",
        {
          "type": "Cite",
          "target": "bib-bib2",
          "content": [
            "2"
          ]
        },
        "] require"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E24",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E24.m1\" alttext=\"\\smash[b]{\\mathcal{O}\\biggl(\\max\\biggl\\{n;\\frac{\\sqrt{n}L_{\\mathrm{avg}}}{\\mu}\\biggr\\}\\log\\frac{R_{0}^{2}}{\\varepsilon}\\biggr)}\" display=\"block\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mrow><mml:mi>max</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">{</mml:mo><mml:mi>n</mml:mi><mml:mo>;</mml:mo><mml:mfrac><mml:mrow><mml:msqrt><mml:mi>n</mml:mi></mml:msqrt><mml:mo>⁢</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi>avg</mml:mi></mml:msub></mml:mrow><mml:mi>μ</mml:mi></mml:mfrac><mml:mo maxsize=\"210%\" minsize=\"210%\">}</mml:mo></mml:mrow></mml:mrow><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mfrac><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mi>ε</mml:mi></mml:mfrac></mml:mrow></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\smash[b]{\\mathcal{O}\\biggl(\\max\\biggl\\{n;\\frac{\\sqrt{n}L_{\\mathrm{avg}}}{\\mu}\\biggr\\}\\log\\frac{R_{0}^{2}}{\\varepsilon}\\biggr)}"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "and"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.E25",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.E25.m1\" alttext=\"\\smash[t]{\\mathcal{O}\\biggl(\\frac{\\sqrt{n}L_{\\mathrm{avg}}D^{2}_{\\mathcal{Z},V}}{\\varepsilon}\\biggr)}\" display=\"block\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mfrac><mml:mrow><mml:msqrt><mml:mi>n</mml:mi></mml:msqrt><mml:mo>⁢</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi>avg</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mi>ε</mml:mi></mml:mfrac><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\smash[t]{\\mathcal{O}\\biggl(\\frac{\\sqrt{n}L_{\\mathrm{avg}}D^{2}_{\\mathcal{Z},V}}{\\varepsilon}\\biggr)}"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "stochastic oracle calls, respectively.\nIt remains to discuss the effect of batching on the method from (",
        {
          "type": "Cite",
          "target": "S4-E23",
          "content": [
            "23"
          ]
        },
        "), i.e., see how the oracle complexity bounds change if instead a single sample ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p6.m1\" alttext=\"F_{\\xi^{k}}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub></mml:math>",
          "meta": {
            "altText": "F_{\\xi^{k}}"
          }
        },
        " at each iteration we use but a batch size of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p6.m2\" alttext=\"b\" display=\"inline\"><mml:mi>b</mml:mi></mml:math>",
          "meta": {
            "altText": "b"
          }
        },
        ": ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p6.m3\" alttext=\"\\frac{1}{b}\\sum_{i\\in S^{k}}F_{i}\" display=\"inline\"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>b</mml:mi></mml:mfrac><mml:mo>⁢</mml:mo><mml:mrow><mml:msub><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>S</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow></mml:msub><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\frac{1}{b}\\sum_{i\\in S^{k}}F_{i}"
          }
        },
        ", where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p6.m4\" alttext=\"S_{k}\\subseteq\\{1,\\ldots,n\\}\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>⊆</mml:mo><mml:mrow><mml:mo stretchy=\"false\">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant=\"normal\">…</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi><mml:mo stretchy=\"false\">}</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "S_{k}\\subseteq\\{1,\\ldots,n\\}"
          }
        },
        " is the set of cardinality ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p6.m5\" alttext=\"b\" display=\"inline\"><mml:mi>b</mml:mi></mml:math>",
          "meta": {
            "altText": "b"
          }
        },
        " of indices in the mini-batch.\nIn this case, the methods from [",
        {
          "type": "Cite",
          "target": "bib-bib2",
          "content": [
            "2"
          ]
        },
        "] give estimates (",
        {
          "type": "Cite",
          "target": "S4-E24",
          "content": [
            "24"
          ]
        },
        ") and (",
        {
          "type": "Cite",
          "target": "S4-E25",
          "content": [
            "25"
          ]
        },
        "), but multiplied by an additional factor ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p6.m6\" alttext=\"\\sqrt{b}\" display=\"inline\"><mml:msqrt><mml:mi>b</mml:mi></mml:msqrt></mml:math>",
          "meta": {
            "altText": "\\sqrt{b}"
          }
        },
        ".\nThis extra multiplier issue was resolved in [",
        {
          "type": "Cite",
          "target": "bib-bib69",
          "content": [
            "69"
          ]
        },
        "] using the following method:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex21X",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex21X.m1\" alttext=\"\\displaystyle\\Delta^{k}=\\smash[b]{\\frac{1}{b}}\\mathop{\\smash[b]{\\sum_{i\\in S^{k}}}}\\bigl[F_{i}(z^{k})-F_{i}(w^{k-1})\" class=\"ltx_math_unparsed\" display=\"inline\"><mml:mrow><mml:msup><mml:mi mathvariant=\"normal\">Δ</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mstyle displaystyle=\"true\"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>b</mml:mi></mml:mfrac></mml:mstyle><mml:mstyle displaystyle=\"true\"><mml:munder><mml:mo movablelimits=\"false\">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>S</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo maxsize=\"120%\" minsize=\"120%\">[</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo>−</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>w</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle\\Delta^{k}=\\smash[b]{\\frac{1}{b}}\\mathop{\\smash[b]{\\sum_{i\\in S^{k}}}}\\bigl[F_{i}(z^{k})-F_{i}(w^{k-1})"
      }
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex21Xa",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex21Xa.m1\" alttext=\"\\displaystyle\\hskip 50.00008pt+\\alpha(F_{i}(z^{k})-F_{i}(z^{k-1}))\\bigr]+F(w^{k-1}),\" class=\"ltx_math_unparsed\" display=\"inline\"><mml:mrow><mml:mo>+</mml:mo><mml:mi>α</mml:mi><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo>−</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo maxsize=\"120%\" minsize=\"120%\">]</mml:mo><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>w</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle\\hskip 50.00008pt+\\alpha(F_{i}(z^{k})-F_{i}(z^{k-1}))\\bigr]+F(w^{k-1}),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex21Xb",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex21Xb.m1\" alttext=\"\\displaystyle z^{k+1}=P_{\\mathcal{Z}}\\bigl(z^{k}+\\tau(w^{k}-z^{k})-\\gamma\\Delta^{k}\\bigr),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"120%\" minsize=\"120%\">(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mrow><mml:mi>τ</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo>⁢</mml:mo><mml:msup><mml:mi mathvariant=\"normal\">Δ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow></mml:mrow><mml:mo maxsize=\"120%\" minsize=\"120%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1}=P_{\\mathcal{Z}}\\bigl(z^{k}+\\tau(w^{k}-z^{k})-\\gamma\\Delta^{k}\\bigr),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex21Xc",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex21Xc.m1\" alttext=\"\\displaystyle w^{k+1}=\\begin{cases}z^{k+1},&\\textrm{with probability}\\ p,\\\\\nw^{k},&\\textrm{with probability}\\ 1-p.\\end{cases}\" display=\"inline\"><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnspacing=\"5pt\" rowspacing=\"0pt\"><mml:mtr><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:mrow><mml:mtext>with probability</mml:mtext><mml:mo lspace=\"0.500em\">⁢</mml:mo><mml:mi>p</mml:mi></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd class=\"ltx_align_left\" columnalign=\"left\"><mml:mrow><mml:mrow><mml:mrow><mml:mtext>with probability</mml:mtext><mml:mo>⁢</mml:mo><mml:mn> 1</mml:mn></mml:mrow><mml:mo>−</mml:mo><mml:mi>p</mml:mi></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle w^{k+1}=\\begin{cases}z^{k+1},&\\textrm{with probability}\\ p,\\\\\nw^{k},&\\textrm{with probability}\\ 1-p.\\end{cases}"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "The authors proved that in the strongly monotone case this method gives estimate (",
        {
          "type": "Cite",
          "target": "S4-E24",
          "content": [
            "24"
          ]
        },
        "), i.e., without additional logarithmic factors and without factors depending on ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p6.m7\" alttext=\"b\" display=\"inline\"><mml:mi>b</mml:mi></mml:math>",
          "meta": {
            "altText": "b"
          }
        },
        "."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "The only issue that remains to be understood is whether the current state-of-the-art methods with best complexities from [",
        {
          "type": "Cite",
          "target": "bib-bib2",
          "content": [
            "2"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib69",
          "content": [
            "69"
          ]
        },
        "] are optimal.\nThe lower bounds from [",
        {
          "type": "Cite",
          "target": "bib-bib53",
          "content": [
            "53"
          ]
        },
        "] claim that under Assumptions ",
        {
          "type": "Cite",
          "target": "Thm112assumption5",
          "content": [
            "5"
          ]
        },
        " and ",
        {
          "type": "Cite",
          "target": "Thm112assumption2",
          "content": [
            "2"
          ]
        },
        ", the methods above are optimal.\nHowever, under ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p7.m1\" alttext=\"L_{\\mathrm{max}}\" display=\"inline\"><mml:msub><mml:mi>L</mml:mi><mml:mi>max</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "L_{\\mathrm{max}}"
          }
        },
        "-Lipschitzness of all ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p7.m2\" alttext=\"F_{i}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "F_{i}"
          }
        },
        ", ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS2.p7.m3\" alttext=\"i\\in\\{1,\\ldots,n\\}\" display=\"inline\"><mml:mrow><mml:mi>i</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy=\"false\">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant=\"normal\">…</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi><mml:mo stretchy=\"false\">}</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "i\\in\\{1,\\ldots,n\\}"
          }
        },
        " and Assumption ",
        {
          "type": "Cite",
          "target": "Thm112assumption2",
          "content": [
            "2"
          ]
        },
        ", the lower bound from [",
        {
          "type": "Cite",
          "target": "bib-bib53",
          "content": [
            "53"
          ]
        },
        "] is"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex22",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex22.m1\" alttext=\"\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\Omega\\biggl(R_{0}^{2}\\exp\\biggl(-\\min\\biggl\\{\\frac{1}{n},\\frac{\\mu}{L_{\\mathrm{max}}}\\biggr\\}k\\biggr)\\biggr).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0.1389em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo lspace=\"0.1389em\">=</mml:mo><mml:mrow><mml:mi mathvariant=\"normal\">Ω</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mo rspace=\"0.167em\">−</mml:mo><mml:mrow><mml:mrow><mml:mi>min</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">{</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>,</mml:mo><mml:mfrac><mml:mi>μ</mml:mi><mml:msub><mml:mi>L</mml:mi><mml:mi>max</mml:mi></mml:msub></mml:mfrac><mml:mo maxsize=\"210%\" minsize=\"210%\">}</mml:mo></mml:mrow></mml:mrow><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}=\\Omega\\biggl(R_{0}^{2}\\exp\\biggl(-\\min\\biggl\\{\\frac{1}{n},\\frac{\\mu}{L_{\\mathrm{max}}}\\biggr\\}k\\biggr)\\biggr)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "The question whether this lower bound is tight remains open."
      ]
    },
    {
      "type": "Heading",
      "id": "S4.SS3",
      "depth": 2,
      "content": [
        "4.3 Cocoercivity assumption"
      ]
    },
    {
      "type": "Paragraph",
      "id": "S4.SS3.p1",
      "content": [
        "In some papers, the following assumption is used instead of Assumption ",
        {
          "type": "Cite",
          "target": "Thm112assumption1",
          "content": [
            "1"
          ]
        },
        "."
      ]
    },
    {
      "type": "Claim",
      "id": "Thm112assumption6",
      "label": "Assumption 6(Cocoercivity).",
      "title": [
        {
          "type": "Strong",
          "content": [
            "Assumption 6"
          ]
        },
        {
          "type": "Strong",
          "content": []
        },
        "(Cocoercivity)",
        {
          "type": "Strong",
          "content": [
            "."
          ]
        }
      ],
      "content": [
        {
          "type": "Paragraph",
          "id": "Thm112assumption6.p1",
          "content": [
            "The operator ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption6.p1.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
              "meta": {
                "altText": "F"
              }
            },
            " is ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption6.p1.m2\" alttext=\"\\ell\" display=\"inline\"><mml:mi mathvariant=\"normal\">ℓ</mml:mi></mml:math>",
              "meta": {
                "altText": "\\ell"
              }
            },
            "-cocoercive, i.e., for all ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption6.p1.m3\" alttext=\"u,v\\in\\mathcal{Z}\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>∈</mml:mo><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:mrow></mml:math>",
              "meta": {
                "altText": "u,v\\in\\mathcal{Z}"
              }
            },
            ", we have ",
            {
              "type": "MathFragment",
              "mathLanguage": "mathml",
              "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"Thm112assumption6.p1.m4\" alttext=\"\\lVert F(u)-F(v)\\rVert_{2}^{2}\\leq\\ell\\langle F(u)-F(v),u-v\\rangle\" display=\"inline\"><mml:mrow><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>≤</mml:mo><mml:mrow><mml:mi mathvariant=\"normal\">ℓ</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">⟨</mml:mo><mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>−</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy=\"false\">⟩</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>",
              "meta": {
                "altText": "\\lVert F(u)-F(v)\\rVert_{2}^{2}\\leq\\ell\\langle F(u)-F(v),u-v\\rangle"
              }
            },
            "."
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "id": "S4.SS3.p2",
      "content": [
        "Cocoercivity is stronger than monotonicity + Lipschitzness, i.e., not all monotone Lipschitz operators are cocoercive.\nNote, for instance, that the operator for the bilinear SPP (",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS3.p2.m1\" alttext=\"\\min_{x}\\max_{y}x^{\\top}Ay\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>min</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mrow><mml:msub><mml:mi>max</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mrow><mml:msup><mml:mi>x</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo>⁢</mml:mo><mml:mi>A</mml:mi><mml:mo>⁢</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\min_{x}\\max_{y}x^{\\top}Ay"
          }
        },
        ") is not cocoercive.\nHowever, if ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS3.p2.m2\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " is ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS3.p2.m3\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
          "meta": {
            "altText": "L"
          }
        },
        "-Lipschitz and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS3.p2.m4\" alttext=\"\\mu\" display=\"inline\"><mml:mi>μ</mml:mi></mml:math>",
          "meta": {
            "altText": "\\mu"
          }
        },
        "-strongly monotone, then it is ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS3.p2.m5\" alttext=\"(L^{2}/\\mu)\" display=\"inline\"><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>/</mml:mo><mml:mi>μ</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:math>",
          "meta": {
            "altText": "(L^{2}/\\mu)"
          }
        },
        "-cocoercive.\nMoreover, the operator corresponding to a convex ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS3.p2.m6\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
          "meta": {
            "altText": "L"
          }
        },
        "-smooth minimization problem is ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS3.p2.m7\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
          "meta": {
            "altText": "L"
          }
        },
        "-cocoercive."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S4.SS3.p3",
      "content": [
        "There is no need to use an Extragradient method for cocoercive operators.\nOne can apply the iterative scheme (",
        {
          "type": "Cite",
          "target": "S3-E7",
          "content": [
            "7"
          ]
        },
        ") and its modifications for the stochastic case.\nIn spite of this, the first work on cocoercive operators in the stochastic cases used the Extragradient as the basic method [",
        {
          "type": "Cite",
          "target": "bib-bib26",
          "content": [
            "26"
          ]
        },
        "].\nIn this paper, the authors investigated methods for finite-sum problems.\nThe subsequent results from [",
        {
          "type": "Cite",
          "target": "bib-bib81",
          "content": [
            "81"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib15",
          "content": [
            "15"
          ]
        },
        "] give an almost complete picture of stochastic algorithms based on method (",
        {
          "type": "Cite",
          "target": "S3-E7",
          "content": [
            "7"
          ]
        },
        ") for operators under Assumption ",
        {
          "type": "Cite",
          "target": "Thm112assumption6",
          "content": [
            "6"
          ]
        },
        ".\nIn particular, the work [",
        {
          "type": "Cite",
          "target": "bib-bib15",
          "content": [
            "15"
          ]
        },
        "] provides a unified analysis for a large number of popular stochastic methods currently known for minimization problems [",
        {
          "type": "Cite",
          "target": "bib-bib51",
          "content": [
            "51"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Heading",
      "id": "S4.SS4",
      "depth": 2,
      "content": [
        "4.4 High-probability convergence"
      ]
    },
    {
      "type": "Paragraph",
      "id": "S4.SS4.p1",
      "content": [
        "Up to this point, we focused on convergence-in-expectation guarantees for stochastic methods, i.e., bounds on ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p1.m1\" alttext=\"\\mathbb{E}[\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})]\" display=\"inline\"><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">[</mml:mo><mml:mrow><mml:msub><mml:mi>Gap</mml:mi><mml:mi>VI</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">]</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathbb{E}[\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})]"
          }
        },
        " and/or ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p1.m2\" alttext=\"\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}\" display=\"inline\"><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo lspace=\"0em\">⁢</mml:mo><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathbb{E}\\lVert z^{k}-z^{*}\\rVert_{2}^{2}"
          }
        },
        ".\nHowever, ",
        {
          "type": "Emphasis",
          "content": [
            "high-probability convergence guarantees"
          ]
        },
        ", i.e., bounds on ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p1.m3\" alttext=\"\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>Gap</mml:mi><mml:mi>VI</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})"
          }
        },
        " and/or ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p1.m4\" alttext=\"\\lVert z^{k}-z^{*}\\rVert_{2}^{2}\" display=\"inline\"><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:math>",
          "meta": {
            "altText": "\\lVert z^{k}-z^{*}\\rVert_{2}^{2}"
          }
        },
        " that hold with probability at least ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p1.m5\" alttext=\"1-\\beta\" display=\"inline\"><mml:mrow><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>β</mml:mi></mml:mrow></mml:math>",
          "meta": {
            "altText": "1-\\beta"
          }
        },
        " for a given confidence level ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p1.m6\" alttext=\"\\beta\\in(0,1)\" display=\"inline\"><mml:mrow><mml:mi>β</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\beta\\in(0,1)"
          }
        },
        ", reflect the real behavior of the methods more accurately [",
        {
          "type": "Cite",
          "target": "bib-bib50",
          "content": [
            "50"
          ]
        },
        "].\nDespite this fact, high-probability convergence of stochastic methods for solving VIs is studied only in a couple of works."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S4.SS4.p2",
      "content": [
        "It is worth mentioning that one can always deduce the high-probability bound from the in-expectation one via Markov’s inequality.\nHowever, in this case, the derived rate of convergence will have a negative-power dependence on ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p2.m1\" alttext=\"\\beta^{-1}\" display=\"inline\"><mml:msup><mml:mi>β</mml:mi><mml:mrow><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "\\beta^{-1}"
          }
        },
        ".\nSuch guarantees are not desirable and the goal is to derive the rates that have a (poly-)logarithmic dependence on the confidence level, i.e., ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p2.m2\" alttext=\"\\beta\" display=\"inline\"><mml:mi>β</mml:mi></mml:math>",
          "meta": {
            "altText": "\\beta"
          }
        },
        " should appear only in the ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p2.m3\" alttext=\"\\mathcal{O}(\\operatorname{poly}(\\log(\\frac{1}{\\beta})))\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>poly</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>β</mml:mi></mml:mfrac><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathcal{O}(\\operatorname{poly}(\\log(\\frac{1}{\\beta})))"
          }
        },
        " factor."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "The first, and for many years the only high-probability guarantees of this type for solving stochastic VIs were derived in [",
        {
          "type": "Cite",
          "target": "bib-bib60",
          "content": [
            "60"
          ]
        },
        "].\nThe authors assume that ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p3.m1\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " is monotone and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p3.m2\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
          "meta": {
            "altText": "L"
          }
        },
        "-Lipschitz, the underlying domain is bounded, and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p3.m3\" alttext=\"F_{\\xi}\" display=\"inline\"><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "F_{\\xi}"
          }
        },
        " is an unbiased estimator with sub-Gaussian (light) tails of the distribution:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex23",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex23.m1\" alttext=\"\\mathbb{E}\\biggl[\\exp\\biggl(\\frac{\\lVert F_{\\xi}(x)-F(x)\\rVert_{2}^{2}}{\\sigma^{2}}\\biggr)\\biggr]\\leq\\exp(1).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mi>𝔼</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">[</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mfrac><mml:msubsup><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>ξ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:msup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mfrac><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">]</mml:mo></mml:mrow></mml:mrow><mml:mo>≤</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\mathbb{E}\\biggl[\\exp\\biggl(\\frac{\\lVert F_{\\xi}(x)-F(x)\\rVert_{2}^{2}}{\\sigma^{2}}\\biggr)\\biggr]\\leq\\exp(1)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "The above condition is much stronger than Assumption ",
        {
          "type": "Cite",
          "target": "Thm112assumption3",
          "content": [
            "3"
          ]
        },
        ".\nUnder the listed assumptions, the authors of [",
        {
          "type": "Cite",
          "target": "bib-bib60",
          "content": [
            "60"
          ]
        },
        "] prove that after ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p3.m4\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
          "meta": {
            "altText": "k"
          }
        },
        " iterations of Mirror-Prox with probability at least ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p3.m5\" alttext=\"1-\\beta\" display=\"inline\"><mml:mrow><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>β</mml:mi></mml:mrow></mml:math>",
          "meta": {
            "altText": "1-\\beta"
          }
        },
        " (for any ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p3.m6\" alttext=\"\\beta\\in(0,1)\" display=\"inline\"><mml:mrow><mml:mi>β</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\beta\\in(0,1)"
          }
        },
        ") the following inequality is in force:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex24",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex24.m1\" alttext=\"\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})=\\mathcal{O}\\biggl(\\frac{LD_{\\mathcal{Z}}^{2}}{k}+\\frac{\\sigma D_{\\mathcal{Z}}\\log(1/{\\beta})}{\\sqrt{k}}\\biggr).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>Gap</mml:mi><mml:mi>VI</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>L</mml:mi><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>D</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mi>k</mml:mi></mml:mfrac><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:mi>σ</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mi>β</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:msqrt><mml:mi>k</mml:mi></mml:msqrt></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})=\\mathcal{O}\\biggl(\\frac{LD_{\\mathcal{Z}}^{2}}{k}+\\frac{\\sigma D_{\\mathcal{Z}}\\log(1/{\\beta})}{\\sqrt{k}}\\biggr)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "Up to the logarithmic factor this result coincides with in-expectation one and, thus, it is optimal (up to the logarithms).\nHowever, the result is derived under the restrictive light-tails assumption."
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "This last limitation was recently addressed in [",
        {
          "type": "Cite",
          "target": "bib-bib49",
          "content": [
            "49"
          ]
        },
        "], where the authors derive the high-probability rates for the considered problem under just the bounded variance assumption.\nIn particular, they consider the clipped-SEG for problems with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m1\" alttext=\"\\mathcal{Z}=\\mathbb{R}^{d}\" display=\"inline\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mathcal{Z}=\\mathbb{R}^{d}"
          }
        },
        ":"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex25X",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex25X.m1\" alttext=\"\\displaystyle z^{k+1/2}=z^{k}-\\gamma\\cdot\\operatorname{clip}(F_{\\xi^{k}}(z^{k}),\\lambda_{k}),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo lspace=\"0.222em\" rspace=\"0.222em\">⋅</mml:mo><mml:mrow><mml:mi>clip</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>λ</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1/2}=z^{k}-\\gamma\\cdot\\operatorname{clip}(F_{\\xi^{k}}(z^{k}),\\lambda_{k}),"
      }
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex25Xa",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex25Xa.m1\" alttext=\"\\displaystyle z^{k+1}=z^{k}-\\gamma\\cdot\\operatorname{clip}(F_{\\xi^{k+1/2}}(z^{k+1/2}),\\lambda_{k+1/2}),\" display=\"inline\"><mml:mrow><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>z</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>−</mml:mo><mml:mrow><mml:mi>γ</mml:mi><mml:mo lspace=\"0.222em\" rspace=\"0.222em\">⋅</mml:mo><mml:mrow><mml:mi>clip</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:msup><mml:mi>ξ</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>z</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>λ</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\displaystyle z^{k+1}=z^{k}-\\gamma\\cdot\\operatorname{clip}(F_{\\xi^{k+1/2}}(z^{k+1/2}),\\lambda_{k+1/2}),"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "where ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m2\" alttext=\"\\operatorname{clip}(x,\\lambda)=\\min\\{1,{\\lambda}/{\\lVert x\\rVert_{2}}\\}x\" display=\"inline\"><mml:mrow><mml:mrow><mml:mi>clip</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>λ</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi>min</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mrow><mml:mi>λ</mml:mi><mml:mo>/</mml:mo><mml:msub><mml:mrow><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0em\">∥</mml:mo><mml:mi>x</mml:mi><mml:mo fence=\"true\" lspace=\"0em\" rspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo stretchy=\"false\">}</mml:mo></mml:mrow></mml:mrow><mml:mo>⁢</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\operatorname{clip}(x,\\lambda)=\\min\\{1,{\\lambda}/{\\lVert x\\rVert_{2}}\\}x"
          }
        },
        " is the clipping operator, a popular tool in deep learning [",
        {
          "type": "Cite",
          "target": "bib-bib46",
          "content": [
            "46"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib103",
          "content": [
            "103"
          ]
        },
        "].\nIn the setup when ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m3\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " is monotone and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m4\" alttext=\"L\" display=\"inline\"><mml:mi>L</mml:mi></mml:math>",
          "meta": {
            "altText": "L"
          }
        },
        "-Lipschitz and Assumption ",
        {
          "type": "Cite",
          "target": "Thm112assumption3",
          "content": [
            "3"
          ]
        },
        " holds, in [",
        {
          "type": "Cite",
          "target": "bib-bib49",
          "content": [
            "49"
          ]
        },
        "] it is proved that after ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m5\" alttext=\"k\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>",
          "meta": {
            "altText": "k"
          }
        },
        " iterations of clipped-SEG with probability at least ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m6\" alttext=\"1-\\beta\" display=\"inline\"><mml:mrow><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>β</mml:mi></mml:mrow></mml:math>",
          "meta": {
            "altText": "1-\\beta"
          }
        },
        " (for any ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m7\" alttext=\"\\beta\\in(0,1)\" display=\"inline\"><mml:mrow><mml:mi>β</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\beta\\in(0,1)"
          }
        },
        ") the following inequality holds:"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S4.Ex26",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.Ex26.m1\" alttext=\"\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})=\\mathcal{O}\\biggl(\\frac{LR_{0}^{2}\\log({k}/{\\beta})}{k}+\\frac{\\sigma R_{0}\\sqrt{\\log({k}{/\\beta})}}{\\sqrt{k}}\\biggr).\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>Gap</mml:mi><mml:mi>VI</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mover accent=\"true\"><mml:mi>z</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>L</mml:mi><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>R</mml:mi><mml:mn>0</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>/</mml:mo><mml:mi>β</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mi>k</mml:mi></mml:mfrac><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:mi>σ</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>⁢</mml:mo><mml:msqrt><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>/</mml:mo><mml:mi>β</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:msqrt></mml:mrow><mml:msqrt><mml:mi>k</mml:mi></mml:msqrt></mml:mfrac></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo lspace=\"0em\">.</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\operatorname{Gap}_{\\mathrm{VI}}(\\hat{z}^{k})=\\mathcal{O}\\biggl(\\frac{LR_{0}^{2}\\log({k}/{\\beta})}{k}+\\frac{\\sigma R_{0}\\sqrt{\\log({k}{/\\beta})}}{\\sqrt{k}}\\biggr)."
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "Up to the differences in logarithmic factors, the definition of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m8\" alttext=\"\\sigma\" display=\"inline\"><mml:mi>σ</mml:mi></mml:math>",
          "meta": {
            "altText": "\\sigma"
          }
        },
        ", and the difference between ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m9\" alttext=\"D_{\\mathcal{Z}}\" display=\"inline\"><mml:msub><mml:mi>D</mml:mi><mml:mi class=\"ltx_font_mathcaligraphic\">𝒵</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "D_{\\mathcal{Z}}"
          }
        },
        " and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m10\" alttext=\"R_{0}\" display=\"inline\"><mml:msub><mml:mi>R</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math>",
          "meta": {
            "altText": "R_{0}"
          }
        },
        ", the rate coincides with the one from [",
        {
          "type": "Cite",
          "target": "bib-bib60",
          "content": [
            "60"
          ]
        },
        "], but it was derived without the light-tails assumption.\nThe key algorithmic tool that allows removing the light-tails assumption is clipping: with a proper choice of the clipping level ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m11\" alttext=\"\\lambda\" display=\"inline\"><mml:mi>λ</mml:mi></mml:math>",
          "meta": {
            "altText": "\\lambda"
          }
        },
        " the authors cut heavy tails without making the bias too large.\nIt is worth mentioning that the result for clipped-SEG is derived for the unconstrained case and the rate depends on ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m12\" alttext=\"R_{0}\" display=\"inline\"><mml:msub><mml:mi>R</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math>",
          "meta": {
            "altText": "R_{0}"
          }
        },
        ", while in [",
        {
          "type": "Cite",
          "target": "bib-bib60",
          "content": [
            "60"
          ]
        },
        "], the analysis relies on the boundedness of the domain, the diameter of which appears explicitly in the rate obtained.\nTo remove the dependence on the diameter of the domain, the authors of [",
        {
          "type": "Cite",
          "target": "bib-bib48",
          "content": [
            "48"
          ]
        },
        "] show that with high probability the iterates produced by clipped-SEG stay in the ball around ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m13\" alttext=\"x^{*}\" display=\"inline\"><mml:msup><mml:mi>x</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:math>",
          "meta": {
            "altText": "x^{*}"
          }
        },
        " with a radius proportional to ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m14\" alttext=\"R_{0}\" display=\"inline\"><mml:msub><mml:mi>R</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math>",
          "meta": {
            "altText": "R_{0}"
          }
        },
        ".\nUsing this trick, they also show that it is sufficient that all the assumptions (monotonicity and Lipschitzness of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m15\" alttext=\"F\" display=\"inline\"><mml:mi>F</mml:mi></mml:math>",
          "meta": {
            "altText": "F"
          }
        },
        " and bounded variance) hold just on this ball.\nSuch a degree of generality allows them to cover problems that are non-Lipschitz on ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S4.SS4.p4.m16\" alttext=\"\\mathbb{R}^{d}\" display=\"inline\"><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:math>",
          "meta": {
            "altText": "\\mathbb{R}^{d}"
          }
        },
        " (e.g., for certain monotone polynomially growing operators) and also the situation when the variance is bounded only on a compact set, which is common for many finite-sum problems.\nFinally, [",
        {
          "type": "Cite",
          "target": "bib-bib48",
          "content": [
            "48"
          ]
        },
        "] contains high-probability convergence results for strongly monotone VIs and VIs with structured non-monotonicity."
      ]
    },
    {
      "type": "Heading",
      "id": "S5",
      "depth": 1,
      "content": [
        "5 Recent advances"
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.p1",
      "content": [
        "In this section, we report briefly on a few recent theoretical advances with practical impacts."
      ]
    },
    {
      "type": "Heading",
      "id": "S5.SS1",
      "depth": 2,
      "content": [
        "5.1 Saddle point problems with different constants of strong convexity and strong concavity"
      ]
    },
    {
      "type": "Paragraph",
      "content": [
        "Saddle point problems with different constants of strong convexity and strong concavity started gaining interest a few years ago, see e.g., [",
        {
          "type": "Cite",
          "target": "bib-bib4",
          "content": [
            "4"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib77",
          "content": [
            "77"
          ]
        },
        "].\nHowever, even for the particular case"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S5.Ex27",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.Ex27.m1\" alttext=\"\\min_{x\\in\\mathbb{R}^{d_{x}}}\\max_{y\\in\\mathbb{R}^{d_{y}}}g(x,y)=f(x)+y^{\\top}\\mathbf{A}x-h(y),\" display=\"block\"><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:munder><mml:mi>min</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:msup></mml:mrow></mml:munder><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mrow><mml:munder><mml:mi>max</mml:mi><mml:mrow><mml:mi>y</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:msup></mml:mrow></mml:munder><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mi>g</mml:mi></mml:mrow></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo>⁢</mml:mo><mml:mi>𝐀</mml:mi><mml:mo>⁢</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\min_{x\\in\\mathbb{R}^{d_{x}}}\\max_{y\\in\\mathbb{R}^{d_{y}}}g(x,y)=f(x)+y^{\\top}\\mathbf{A}x-h(y),"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "where the function ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p1.m1\" alttext=\"f\" display=\"inline\"><mml:mi>f</mml:mi></mml:math>",
          "meta": {
            "altText": "f"
          }
        },
        " is ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p1.m2\" alttext=\"\\mu_{x}\" display=\"inline\"><mml:msub><mml:mi>μ</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "\\mu_{x}"
          }
        },
        "-strongly convex (",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p1.m3\" alttext=\"\\mu_{x}>0\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>μ</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mu_{x}>0"
          }
        },
        ") and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p1.m4\" alttext=\"L_{x}\" display=\"inline\"><mml:msub><mml:mi>L</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "L_{x}"
          }
        },
        "-smooth, and the function ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p1.m5\" alttext=\"h\" display=\"inline\"><mml:mi>h</mml:mi></mml:math>",
          "meta": {
            "altText": "h"
          }
        },
        " is ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p1.m6\" alttext=\"\\mu_{y}\" display=\"inline\"><mml:msub><mml:mi>μ</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "\\mu_{y}"
          }
        },
        "-strongly convex (",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p1.m7\" alttext=\"\\mu_{y}>0\" display=\"inline\"><mml:mrow><mml:msub><mml:mi>μ</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\mu_{y}>0"
          }
        },
        ") and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p1.m8\" alttext=\"L_{y}\" display=\"inline\"><mml:msub><mml:mi>L</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:math>",
          "meta": {
            "altText": "L_{y}"
          }
        },
        "-smooth, optimal algorithms have been proposed only recently [",
        {
          "type": "Cite",
          "target": "bib-bib72",
          "content": [
            "72"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib116",
          "content": [
            "116"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib58",
          "content": [
            "58"
          ]
        },
        "].\nThese algorithms have the convergence rates"
      ]
    },
    {
      "type": "MathBlock",
      "id": "S5.Ex28",
      "mathLanguage": "mathml",
      "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.Ex28.m1\" alttext=\"\\mathcal{O}\\biggl(\\biggl(\\sqrt{\\frac{L_{x}}{\\mu_{x}}}+\\sqrt{\\frac{\\lambda_{\\mathrm{max}}(\\mathbf{A}^{\\top}\\mathbf{A})}{\\mu_{x}\\mu_{y}}}+\\sqrt{\\frac{L_{y}}{\\mu_{y}}}\\biggr)\\log{\\frac{1}{\\varepsilon}}\\biggr)\" display=\"block\"><mml:mrow><mml:mi class=\"ltx_font_mathcaligraphic\">𝒪</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">(</mml:mo><mml:mrow><mml:msqrt><mml:mfrac><mml:msub><mml:mi>L</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mfrac></mml:msqrt><mml:mo>+</mml:mo><mml:msqrt><mml:mfrac><mml:mrow><mml:msub><mml:mi>λ</mml:mi><mml:mi>max</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>𝐀</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo>⁢</mml:mo><mml:mi>𝐀</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>μ</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:msub><mml:mi>μ</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:msqrt><mml:mo>+</mml:mo><mml:msqrt><mml:mfrac><mml:msub><mml:mi>L</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mfrac></mml:msqrt></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow><mml:mo lspace=\"0.167em\">⁢</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo lspace=\"0.167em\">⁡</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mrow></mml:mrow><mml:mo maxsize=\"210%\" minsize=\"210%\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
      "meta": {
        "altText": "\\mathcal{O}\\biggl(\\biggl(\\sqrt{\\frac{L_{x}}{\\mu_{x}}}+\\sqrt{\\frac{\\lambda_{\\mathrm{max}}(\\mathbf{A}^{\\top}\\mathbf{A})}{\\mu_{x}\\mu_{y}}}+\\sqrt{\\frac{L_{y}}{\\mu_{y}}}\\biggr)\\log{\\frac{1}{\\varepsilon}}\\biggr)"
      }
    },
    {
      "type": "Paragraph",
      "content": [
        "and attain the lower bound, which was obtained in [",
        {
          "type": "Cite",
          "target": "bib-bib56",
          "content": [
            "56"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib124",
          "content": [
            "124"
          ]
        },
        "] (here one needs to assume that ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p1.m9\" alttext=\"\\lambda_{\\mathrm{min}}(\\mathbf{A}^{\\top}\\mathbf{A})\\leq\\sqrt{\\mu_{x}\\mu_{y}}\" display=\"inline\"><mml:mrow><mml:mrow><mml:msub><mml:mi>λ</mml:mi><mml:mi>min</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:msup><mml:mi>𝐀</mml:mi><mml:mo>⊤</mml:mo></mml:msup><mml:mo>⁢</mml:mo><mml:mi>𝐀</mml:mi></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo>≤</mml:mo><mml:msqrt><mml:mrow><mml:msub><mml:mi>μ</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:msub><mml:mi>μ</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:msqrt></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\lambda_{\\mathrm{min}}(\\mathbf{A}^{\\top}\\mathbf{A})\\leq\\sqrt{\\mu_{x}\\mu_{y}}"
          }
        },
        "; without this assumption no optimal methods are known)."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS1.p2",
      "content": [
        "Note that the algorithm from [",
        {
          "type": "Cite",
          "target": "bib-bib58",
          "content": [
            "58"
          ]
        },
        "] is built upon a technique related to the analysis of primal-dual Extragradient methods via relative Lipschitzness [",
        {
          "type": "Cite",
          "target": "bib-bib28",
          "content": [
            "28"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib115",
          "content": [
            "115"
          ]
        },
        "].\nAs a by-product, this technique makes it possible to obtain Nesterov’s accelerated method as a particular case of primal-dual Extragradient method with relative Lipschitzness [",
        {
          "type": "Cite",
          "target": "bib-bib28",
          "content": [
            "28"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS1.p3",
      "content": [
        "For the non-bilinear SPP, optimal methods, based on the accelerated Monteiro–Svaiter proximal envelope, were developed only in the non-composite case [",
        {
          "type": "Cite",
          "target": "bib-bib24",
          "content": [
            "24"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib71",
          "content": [
            "71"
          ]
        },
        "].\nFor the non-bilinear SPP with composite terms, there is a poly-logarithmic gap between the lower bound and the best known upper bounds [",
        {
          "type": "Cite",
          "target": "bib-bib118",
          "content": [
            "118"
          ]
        },
        "].\nA gap also appears for the SPP with stochastic finite-sum structure [",
        {
          "type": "Cite",
          "target": "bib-bib58",
          "content": [
            "58"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib82",
          "content": [
            "82"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib118",
          "content": [
            "118"
          ]
        },
        "].\nThe stochastic setting with bounded variance was considered in [",
        {
          "type": "Cite",
          "target": "bib-bib32",
          "content": [
            "32"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib85",
          "content": [
            "85"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib125",
          "content": [
            "125"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS1.p4",
      "content": [
        "Further deterministic “cutting-plane” improvements are connected with the additional assumptions about small dimension of the involved vectors ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p4.m1\" alttext=\"x\" display=\"inline\"><mml:mi>x</mml:mi></mml:math>",
          "meta": {
            "altText": "x"
          }
        },
        " or/and ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p4.m2\" alttext=\"y\" display=\"inline\"><mml:mi>y</mml:mi></mml:math>",
          "meta": {
            "altText": "y"
          }
        },
        " (see [",
        {
          "type": "Cite",
          "target": "bib-bib43",
          "content": [
            "43"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib44",
          "content": [
            "44"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib91",
          "content": [
            "91"
          ]
        },
        "]) or with different structural (e.g., SPP on balls in ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p4.m3\" alttext=\"1\" display=\"inline\"><mml:mn>1</mml:mn></mml:math>",
          "meta": {
            "altText": "1"
          }
        },
        "- or ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p4.m4\" alttext=\"\\infty\" display=\"inline\"><mml:mi mathvariant=\"normal\">∞</mml:mi></mml:math>",
          "meta": {
            "altText": "\\infty"
          }
        },
        "-norms) and sparsity assumptions, see e.g., [",
        {
          "type": "Cite",
          "target": "bib-bib21",
          "content": [
            "21"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib111",
          "content": [
            "111"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib112",
          "content": [
            "112"
          ]
        },
        "] and references therein.\nHere lower bounds are mostly unknown."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS1.p5",
      "content": [
        "In this subsection we mentioned many works dealing with (sub-)optimal algorithms for different variants of SPP.\nWe note that, in contrast to convex optimization, where the oracle call is uniquely associated with the gradient call ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p5.m1\" alttext=\"\\nabla f(x)\" display=\"inline\"><mml:mrow><mml:mrow><mml:mo rspace=\"0.167em\">∇</mml:mo><mml:mi>f</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\nabla f(x)"
          }
        },
        ", for SPP we have two criteria: the number of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p5.m2\" alttext=\"\\nabla_{x}g(x,y)\" display=\"inline\"><mml:mrow><mml:mrow><mml:msub><mml:mo>∇</mml:mo><mml:mi>x</mml:mi></mml:msub><mml:mi>g</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\nabla_{x}g(x,y)"
          }
        },
        "-calls and that of ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS1.p5.m3\" alttext=\"\\nabla_{y}g(x,y)\" display=\"inline\"><mml:mrow><mml:mrow><mml:msub><mml:mo>∇</mml:mo><mml:mi>y</mml:mi></mml:msub><mml:mi>g</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow></mml:math>",
          "meta": {
            "altText": "\\nabla_{y}g(x,y)"
          }
        },
        "-calls (and more variants for SPP with composites).\n“Optimality” in the most of the aforementioned papers means that the method is optimal according to the worst of the criteria.\nIn [",
        {
          "type": "Cite",
          "target": "bib-bib4",
          "content": [
            "4"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib118",
          "content": [
            "118"
          ]
        },
        "], the authors consider these criteria separately.\nHowever, the development of the lower bounds and optimal methods for a multi-criterion setup is still an open problem."
      ]
    },
    {
      "type": "Heading",
      "id": "S5.SS2",
      "depth": 2,
      "content": [
        "5.2 Adaptive methods for VI and SPP"
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS2.p1",
      "content": [
        "Interest in adaptive algorithms for stochastic convex optimization mainly arose in 2011 after the development of the AdaGrad (adaptive gradient) [",
        {
          "type": "Cite",
          "target": "bib-bib33",
          "content": [
            "33"
          ]
        },
        "] and Adam (adaptive moment estimation) [",
        {
          "type": "Cite",
          "target": "bib-bib63",
          "content": [
            "63"
          ]
        },
        "] algorithms.\nFor variational inequalities and saddle point problems, people became interested in adaptive methods only in the last few years, see, e.g., [",
        {
          "type": "Cite",
          "target": "bib-bib8",
          "content": [
            "8"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib40",
          "content": [
            "40"
          ]
        },
        "] (see also [",
        {
          "type": "Cite",
          "target": "bib-bib61",
          "content": [
            "61"
          ]
        },
        "]).\nCurrently, this area of research is well developed.\nOne can mention here works devoted to both adaptive step sizes [",
        {
          "type": "Cite",
          "target": "bib-bib5",
          "content": [
            "5"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib34",
          "content": [
            "34"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib35",
          "content": [
            "35"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib114",
          "content": [
            "114"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib117",
          "content": [
            "117"
          ]
        },
        "] and adaptive scaling/preconditioning [",
        {
          "type": "Cite",
          "target": "bib-bib12",
          "content": [
            "12"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib31",
          "content": [
            "31"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib80",
          "content": [
            "80"
          ]
        },
        "].\nApproaches from the second group are based on the idea of a proper combination of AdaGrad/Adam with Extragradient or its modifications.\nAll of the mentioned adaptive methods have no better (typically the same) theoretical rates of convergence than their non-adaptive analogues, but require less input information or demonstrate better performance in practice."
      ]
    },
    {
      "type": "Heading",
      "id": "S5.SS3",
      "depth": 2,
      "content": [
        "5.3 Quasi-Newton and tensor methods for VI and SPP"
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS3.p1",
      "content": [
        "Quasi-Newton methods for solving nonlinear equations (unconstrained VI) and SPP are proposed in [",
        {
          "type": "Cite",
          "target": "bib-bib75",
          "content": [
            "75"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib121",
          "content": [
            "121"
          ]
        },
        "] and [",
        {
          "type": "Cite",
          "target": "bib-bib79",
          "content": [
            "79"
          ]
        },
        "], respectively.\nIn these papers, local superlinear rates of convergence are derived for the modifications of the Broyden-type methods for solving nonlinear equations with Lipschitz Jacobian and SPP with Lipschitz Hessian.\nStochastic versions of these methods for VI and SPP still await to be developed."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS3.p2",
      "content": [
        "Tensor methods for convex optimization problems are currently quite well developed.\nIn particular, starting with [",
        {
          "type": "Cite",
          "target": "bib-bib99",
          "content": [
            "99"
          ]
        },
        "] it has been shown that optimal second- and third-order methods can be implemented with almost the same complexity of each iteration as the Newton method [",
        {
          "type": "Cite",
          "target": "bib-bib89",
          "content": [
            "89"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib97",
          "content": [
            "97"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib39",
          "content": [
            "39"
          ]
        },
        "].\nMoreover, optimal ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS3.p2.m1\" alttext=\"p\" display=\"inline\"><mml:mi>p</mml:mi></mml:math>",
          "meta": {
            "altText": "p"
          }
        },
        "-order methods (which use ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS3.p2.m2\" alttext=\"p\" display=\"inline\"><mml:mi>p</mml:mi></mml:math>",
          "meta": {
            "altText": "p"
          }
        },
        "-order derivatives) significantly reduce the rate of convergence from ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS3.p2.m3\" alttext=\"k^{-2}\" display=\"inline\"><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mo>−</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "k^{-2}"
          }
        },
        " to ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS3.p2.m4\" alttext=\"k^{-(3p+1)/2}\" display=\"inline\"><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>⁢</mml:mo><mml:mi>p</mml:mi></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "k^{-(3p+1)/2}"
          }
        },
        " (see [",
        {
          "type": "Cite",
          "target": "bib-bib70",
          "content": [
            "70"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib23",
          "content": [
            "23"
          ]
        },
        "]).\nFor VI and SPP, the study was initiated in [",
        {
          "type": "Cite",
          "target": "bib-bib94",
          "content": [
            "94"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib88",
          "content": [
            "88"
          ]
        },
        "] and optimal ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS3.p2.m5\" alttext=\"p\" display=\"inline\"><mml:mi>p</mml:mi></mml:math>",
          "meta": {
            "altText": "p"
          }
        },
        "-order methods reduce the rate of convergence from ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS3.p2.m6\" alttext=\"k^{-1}\" display=\"inline\"><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "k^{-1}"
          }
        },
        " to ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS3.p2.m7\" alttext=\"k^{-(p+1)/2}\" display=\"inline\"><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mo>−</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "k^{-(p+1)/2}"
          }
        },
        " (see [",
        {
          "type": "Cite",
          "target": "bib-bib1",
          "content": [
            "1"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib78",
          "content": [
            "78"
          ]
        },
        "]) (for ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS3.p2.m8\" alttext=\"k^{-1}\" display=\"inline\"><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "k^{-1}"
          }
        },
        ", see Theorem ",
        {
          "type": "Cite",
          "target": "Thm112theorem3",
          "content": [
            "3"
          ]
        },
        ").\nHowever, in contrast to convex optimization, the use of tensor methods for sufficiently smooth monotone VIs and convex-concave saddle point problems is not expected to be as effective.\nNote that in [",
        {
          "type": "Cite",
          "target": "bib-bib1",
          "content": [
            "1"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib78",
          "content": [
            "78"
          ]
        },
        "] one can also find optimal rates for strongly monotone VIs and strongly convex-concave SPP.\nStochastic tensor methods for variational inequalities and saddle point problems still await to be developed."
      ]
    },
    {
      "type": "Heading",
      "id": "S5.SS4",
      "depth": 2,
      "content": [
        "5.4 Convergence in terms of the gradient norm for SPP"
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS4.p1",
      "content": [
        "Several recent advances in the development of optimal algorithms are based on accelerated proximal envelopes with proper stopping rules for inner loop algorithms [",
        {
          "type": "Cite",
          "target": "bib-bib71",
          "content": [
            "71"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib70",
          "content": [
            "70"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib68",
          "content": [
            "68"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib109",
          "content": [
            "109"
          ]
        },
        "].\nSuch rules are built upon the norm of the gradient calculated for the target function of the inner problem."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS4.p2",
      "content": [
        "For smooth convex optimization problems, Yu. Nesterov in 2012 posed the problem of making the gradient norm small with the same rate of convergence as a gap in the function values, i.e., proportional to ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS4.p2.m1\" alttext=\"k^{-2}\" display=\"inline\"><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mo>−</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "k^{-2}"
          }
        },
        " (see [",
        {
          "type": "Cite",
          "target": "bib-bib96",
          "content": [
            "96"
          ]
        },
        "]).\nTo address this problem, in [",
        {
          "type": "Cite",
          "target": "bib-bib96",
          "content": [
            "96"
          ]
        },
        "] he proposed an optimal (up to a logarithmic factor) algorithm.\nThis question was further investigated, leading to optimal results without additional logarithmic factors [",
        {
          "type": "Cite",
          "target": "bib-bib62",
          "content": [
            "62"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib98",
          "content": [
            "98"
          ]
        },
        "] (see also [",
        {
          "type": "Cite",
          "target": "bib-bib30",
          "content": [
            "30"
          ]
        },
        "] for explanations and a survey).\nIn the stochastic case, algorithms were presented in [",
        {
          "type": "Cite",
          "target": "bib-bib38",
          "content": [
            "38"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS4.p3",
      "content": [
        "For smooth convex-concave saddle point problems an optimal algorithm with ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS4.p3.m1\" alttext=\"\\lVert\\nabla_{x,y}f(x^{k},y^{k})\\rVert_{2}\" display=\"inline\"><mml:msub><mml:mrow><mml:mo fence=\"true\" rspace=\"0em\">∥</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mo rspace=\"0.167em\">∇</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mi>f</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy=\"false\">(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo stretchy=\"false\">)</mml:mo></mml:mrow></mml:mrow><mml:mo fence=\"true\" lspace=\"0em\">∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub></mml:math>",
          "meta": {
            "altText": "\\lVert\\nabla_{x,y}f(x^{k},y^{k})\\rVert_{2}"
          }
        },
        " proportional to ",
        {
          "type": "MathFragment",
          "mathLanguage": "mathml",
          "text": "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" id=\"S5.SS4.p3.m2\" alttext=\"k^{-1}\" display=\"inline\"><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math>",
          "meta": {
            "altText": "k^{-1}"
          }
        },
        " was proposed in [",
        {
          "type": "Cite",
          "target": "bib-bib122",
          "content": [
            "122"
          ]
        },
        "] (see also [",
        {
          "type": "Cite",
          "target": "bib-bib30",
          "content": [
            "30"
          ]
        },
        "] and [",
        {
          "type": "Cite",
          "target": "bib-bib71",
          "content": [
            "71"
          ]
        },
        "] for monotone inclusion).\nFor the stochastic case, see [",
        {
          "type": "Cite",
          "target": "bib-bib74",
          "content": [
            "74"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib20",
          "content": [
            "20"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib27",
          "content": [
            "27"
          ]
        },
        "]."
      ]
    },
    {
      "type": "Heading",
      "id": "S5.SS5",
      "depth": 2,
      "content": [
        "5.5 Decentralized VI and SPP"
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS5.p1",
      "content": [
        "In practice, in order to solve a variational inequality problem more efficiently and quickly, one usually resorts to distributed methods.\nIn particular, methods that work on arbitrary (possibly time-varying) decentralized communication networks between computing devices are popular."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS5.p2",
      "content": [
        "While the field of decentralized algorithms for minimization problems has been extensively investigated, results for broader classes of problems have only begun to appear in recent years.\nSuch works are primarily focused on saddle point problems [",
        {
          "type": "Cite",
          "target": "bib-bib90",
          "content": [
            "90"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib17",
          "content": [
            "17"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib108",
          "content": [
            "108"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib18",
          "content": [
            "18"
          ]
        },
        ", ",
        {
          "type": "Cite",
          "target": "bib-bib16",
          "content": [
            "16"
          ]
        },
        "], but we note that most of these results can easily be extended to variational inequalities.\nLet us emphasize two works that were from the outset devoted to VIs.\nIn [",
        {
          "type": "Cite",
          "target": "bib-bib13",
          "content": [
            "13"
          ]
        },
        "], the authors proposed a decentralized method with local steps, and [",
        {
          "type": "Cite",
          "target": "bib-bib69",
          "content": [
            "69"
          ]
        },
        "] presented optimal decentralized methods for stochastic (finite-sum) variational inequalities on fixed and varying networks."
      ]
    },
    {
      "type": "Paragraph",
      "id": "S5.SS5.p3",
      "content": [
        {
          "type": "Emphasis",
          "content": [
            "Acknowledgements. "
          ]
        },
        "\nThe work was supported by Russian Science Foundation (project No. 21-71-30005)."
      ]
    },
    {
      "type": "Paragraph",
      "id": "authorinfo",
      "content": [
        "\nAleksandr Beznosikov is a PhD student at the Moscow Institute of Physics and Technology (Moscow, Russia).\nHe is also a researcher at the Laboratory of Mathematical Methods of Optimization and at Laboratory of Advanced Combinatorics and Network Applications in the Moscow Institute of Physics and Technology, a junior researcher at the International Laboratory of SA and HDI in the Higher School of Economics (Moscow), a research intern at Yandex Research (Moscow).\nHis current research interests are concentrated around variational inequalities, saddle point problems, distributed optimization, stochastic optimization, machine learning and federated learning.\n",
        {
          "type": "Link",
          "target": "mailto:anbeznosikov@gmail.com",
          "content": [
            "anbeznosikov@gmail.com"
          ]
        },
        "\nBoris Polyak (1935–2023) was head of the Ya. Z. Tsypkin Laboratory of the Institute for Control Science of the Russian Academy of Sciences in Moscow and a professor at the Moscow University of Physics and Engineering.\nHe received a PhD degree in mathematics from Moscow State University in 1963 and a Doctor of Science degree (habilitation) in engineering from the Institute for Control Science of the Russian Academy of Sciences in Moscow in 1977.\nHe authored or coauthored more than 250 papers in peer-reviewed journals as well as four monographs, including “Introduction to Optimization”.\nHe was an IFAC Fellow, a recipient of the EURO-2012 Gold Medal and the INFORMS Optimization Society Khyachyan Prize.\nHis main area of research was optimization algorithms and optimal control.",
        {
          "type": "Link",
          "target": "mailto:",
          "content": []
        },
        "\nEduard Gorbunov is a researcher at the Laboratory of Mathematical Methods of Optimization in the Moscow Institute of Physics and Technology. His current research interests are concentrated around stochastic optimization and its applications to machine learning, distributed optimization, derivative-free optimization, and variational inequalities.\n",
        {
          "type": "Link",
          "target": "mailto:ed-gorbunov@yandex.ru",
          "content": [
            "ed-gorbunov@yandex.ru"
          ]
        },
        "\nDmitry Kovalev is a PhD student at King Abdullah University of Science and Technology (Thuwal, Saudi Arabia).\nIn 2020 and 2021 he received the Yandex Award (Ilya Segalovich Award).\nHis current research interests include continuous optimization and machine learning.\n",
        {
          "type": "Link",
          "target": "mailto:dakovalev1@gmail.com",
          "content": [
            "dakovalev1@gmail.com"
          ]
        },
        "\nAlexander Gasnikov is a professor at the Moscow Institute of Physics and Technology, head of the Laboratory Mathematical Methods of Optimization and head of the department Mathematical Foundations of Control.\nHe received a Doctor of Science degree (habilitation) in mathematics in 2016 from the Faculty of Control and Applied Mathematics of the Moscow Institute of Physics and Technology.\nIn 2019 he received an award from the Yahoo Faculty Research and Engagement Program.\nIn 2020 he received the Yandex Award (Ilya Segalovich Award).\nIn 2021 he received the Award for Young Scientists from the Moscow government.\nHis main area of research is optimization algorithms.\n",
        {
          "type": "Link",
          "target": "mailto:gasnikov@yandex.ru",
          "content": [
            "gasnikov@yandex.ru"
          ]
        }
      ]
    }
  ]
}