Оптимизация функционалов предыскажения сигнала по типу Виннера-Гаммерштейна, для устранения интермодуляционных компонент, возникающих при усилении мощности тема диссертации и автореферата по ВАК РФ 00.00.00, кандидат наук Масловский Александр Юрьевич

  • Масловский Александр Юрьевич
  • кандидат науккандидат наук
  • 2024, ФГАОУ ВО «Московский физико-технический институт (национальный исследовательский университет)»
  • Специальность ВАК РФ00.00.00
  • Количество страниц 100
Масловский Александр Юрьевич. Оптимизация функционалов предыскажения сигнала по типу Виннера-Гаммерштейна, для устранения интермодуляционных компонент, возникающих при усилении мощности: дис. кандидат наук: 00.00.00 - Другие cпециальности. ФГАОУ ВО «Московский физико-технический институт (национальный исследовательский университет)». 2024. 100 с.

Оглавление диссертации кандидат наук Масловский Александр Юрьевич

Contents

Introduction

Chapter 1. Power Amplifier and Digital Pre-Distortion

1.1 An overview of Power Amplifier

1.2 Nonlinearity of PA

1.3 Digital Predistorter

Chapter 2. Exploiting different kind optimization approaches in non-convex

optimization problems based on the Wiener-Hammerstein functions

2.1 Full gradient Methods

2.1.1 Heavy Ball method

2.1.2 Conjugate gradients method (CG)

2.1.3 Yu. Nesterov conjugate gradient method

2.1.4 Numerical experiments

2.2 Quasi-Newton methods

2.2.1 BFGS/Limited memory BFGS

2.2.2 DFP

2.2.3 Numerical experiments

2.3 Gauss-Newton Methods

2.3.1 Levenberg-Marquardt method (LM)

2.3.2 Flexible Gauss-Newton Method

2.3.3 Numerical experiments

2.4 Stochastic methods

2.4.1 SGD with momentum

2.4.2 Mini-batched Nesterov accelerated gradient descent

2.4.3 Adagrad

2.4.4 Adam

2.4.5 Numerical experiments

2.5 Local conclusion

Chapter 3. Architecture optimization of Winer-Hammerstein functions

3.1 Attention mechanism

3.1.1 Temporal Pattern Attention

3.1.2 Behavioral modeling of TPA approach based on igrnn or igirnn

3.1.3 Numerical experiments

3.1.4 Local conclustion

Chapter 4. Model design optimization problem

4.1 Model design algorithms

4.1.1 Grid Search and Random Search

4.1.2 Quasi Monte-Carlo Method

4.1.3 Tree-Structured Parzen Estimator

4.1.4 Covariance Matrix Adaptation Evolution Strategy

4.1.5 Nondominated Sorting Genetic Algorithm II

4.1.6 Optimization process

4.1.7 Details of internal process optimization

4.1.8 Numerical experiments

Conclusion

References

List of Figures

List of Tables

Appendix A. Overview of optimization methods

A.1 Full gradient Algorithms

A.1.1 Conjigate gradients

A.1.2 Nesterov Acceleration

A.2 Quasi-Newton Methods

A.2.1 BFGS/DFP

A.2.2 L-BFGS

A.3 Gauss-Newton Methods

A.3.1 Flexible modification of Gauss-Newton method

Appendix B. Stochastic first-order methods

B.1 SGD-like Algorithms

B.2 Algorithms with second order momentum

Appendix C. Discreet optimization algorithms

C.1 Covariance Matrix Adaptation Evolution Strategy

C.2 Tree-Structured Parzen Estimator

Рекомендованный список диссертаций по специальности «Другие cпециальности», 00.00.00 шифр ВАК

Введение диссертации (часть автореферата) на тему «Оптимизация функционалов предыскажения сигнала по типу Виннера-Гаммерштейна, для устранения интермодуляционных компонент, возникающих при усилении мощности»

Introduction

Nowadays base stations, which perform in the capacity of radio signal transceivers, are widely used for the implementation and organization of wireless communication between remote devices. Modern base stations have a complex technical structure and include many hardware components that allow for accurate and efficient data transmission. As telecommunications networks evolve, moving from 5G to 6G, a critical issue that needs to be addressed is ensuring that power amplifiers (PA) meet the increasingly stringent standards set. Its role is to amplify the signal from the base station, reduce the effect of interference on the signal and increase the transmission range.

The impact of some ideal amplifiers can be characterized with function VA(x) = a ■ x, where a ^ 1 and x is an input signal. However, real amplifiers are complex non-linear analog devices that couldn't be described by analytical function due to the influence of many external and internal obstructing factors. Power Amplifiers can change phase, cut amplitude of the original signal, and generate parasitic harmonics outside the carrier frequency range. These influences cause significant distortions of the high-frequency and high-bandwidth signal. The spectrum plot from fig. 1 shows that described problem is relevant in the conditions of operation of modern devices: when the signal goes through the power amplifier its spectrum range becomes wider than the spectrum range of the original signal and as a result generates noise for other signals.

-Lhki -Ln n n

-MenflC

PA w/o DPD

\ OPD+PA N

Desired spectrum

MOO 2110 2 120 2130 2.140 2 150 2 160 2170 2160 Fie queue y (GHz)

Figure 1 — Power spectral density plot of original signal, out of PA signal, and result of pre-distorting

signal

One possible solution in this problem is employing of the digital baseband pre-distortion (DPD) technique to compensate non-linear effects that influence the input signal. In this case, DPD acts upon the input signal with an inverse non-linearity with the aim of offsetting the impact of the power amplifier.

The accuracy and efficiency of DPD depends on the quality of the model used to invert the behavior of PA, which has traditionally been achieved using polynomial models such as memory polynomials and generalized memory polynomials. As noted above, one of the most important criteria for the operation of the base station is the most accurate and efficient signal transmission,

which is no longer possible to achieve using classical approaches such as Wiener models and Hammerstein models. To achieve the required accuracy, it is necessary to solve the problem with more complex architectures with a large number of layers, such as Wiener-Hammerstein models, Neural networks, etc., because they are able to generate more complex nonlinearities that can describe the necessary behavior of a power amplifier. However, this type of architecture is mostly non-convex, which causes difficulties in optimizing and converging the model. Due to the lack of stationarity of the signal, it is impossible to somehow find the coefficients of the model once and for all. Therefore, the study of convergence of this type of functionals has significant potential. In this work, we investigate the convergence of Wiener-Hammerstein functionals using various optimization techniques, primarily of the first order, as these can be implemented on a real device.

As noted earlier, not every solution can be used in the hardware, since optimizing a complex function requires a fairly large amount of resources. One of the ways to simplify the functionality is to expand it. This simplifies the process of transferring the gradient into the initial layers of a multi-layered architecture. Another significant way to streamline functionality is to make it through the incorporation of attention technology. This allows for a more accurate utilization of information from hidden layers, resulting in more precise information transmission. This makes it possible to improve the convergence of the original Wiener-Hammerstein type structures and neural networks using a small amount of resources. This work presents a study that demonstrates the potential of utilizing this functionality to address the issue of signal pre-distortion. Modern architectures require a large amount of data to optimize. This is necessary to obtain a more detailed hidden representation and build the correct initial basis for solving the task. Often a large number of parameters are set inside the architecture to generate a more complex dependence or build some complex features, but not all of these parameters are needed to generate the necessary nonlinearities, it is also worth noting that with a large number of parameters, a non-convex function may have problems with convergence, since there is a problem with a damped gradient. However, it is impossible to disable blocks manually and reduce the number of parameters, since it requires a huge amount of calculations. To solve the problem of dimensionality reduction, the methods that reduced the architectural complexity of the model without full convergence have been tested. Based on this idea it's possible to automatically reduce complexity of model and implement it in real device.

Taking into account the nonlinear distortion of the signal in the power amplifier (PA) is necessary when designing modern radio communication systems [[1],[2]]. This is due to the strict requirements for out-of-band radiation at the transmitter output, which is mainly determined by the nature of nonlinear distortion in the power amplifier.

The aim of the work.

The goals of this research is to increase the efficiency of digital signal pre-distortion(DPD) based on Wiener-Hammerstein(W-H) functions to compensate the intermodulation distortions(IMD) of signal that occur in the power amplifier when using the direct learning architecture. For the achievements of the following aims several tasks were solved:

1. realization of the data simulator, which generate the OFDM signal.

2. capturing data of different PA structures (doherty like structures) with different in-band frequency domain.

3. realization of the DPD simulated test-bench for the Massive multi input multi output (MMIMO) case.

4. realization of the simulation test-bench of algorithm for the complexity reduction of W-H structure.

Proposition for the defence:

1. The algorithms proposed in the work for optimizing the functions of W-H direct learning allow a sufficiently strong compensation of IMD components for the power amplifier of the base station in a minimum number of iterations, which the classical gradient method does not allow. Techniques,proposed in this work, were approbated in the task of cancellation of the leakage of the transmitted signal onto the receiving path.

2. The modifications proposed in the work by Wiener-Hammerstein allow to improve the approximation abilities of the model, to solve the problems of signal pre-distortion in power amplification.

3. The proposed algorithm for automatic modification of the structure aims to reduce the resources of complexity taking into account the keeping of the performance in cancellation of the IMD.

Scientific novelty:

1. For the first time, to optimize the Wiener Hammerstein functions for the problem of digital signal pre-processing, it was proposed to use quasi-Newtonian and full-gradient methods for non-convex optimization.

2. For the first time temporal pattern attention were realized in digital pre-distortion techniques to increase the performance of W-H structure in PA of base station.

3. For the first time there were realized the automatic toolkit for the complexity reduction of Wiener-Hammerstein model.

Reliability of the work.

The proposed algorithms make it possible to conduct experiments on a simulated test-bench much faster. The proposed algorithm allows to reduce the complexity of resources of the original structure which will show the same performance as original model after the implementation in real hardware. The result of this research were implemented in the product line.

Probation of the work..

The results were presented at the following conferences:

1. Mathematical Optimization Theory and Operations Research,Irkutsk, Russia, June 16, 2021.

2. Optimization Without Borders, Sochi,Russia, July 1,2021.

3. Quasilinear Equations, Inverse Problems and Their Applications QIPA, Sochi, Russia, August 20, 2022.

4. International Conference Optimization and Applications OPTIMA, Petrovac, Montenegro, 27 September 2022.

5. Progress In Electromagnetics Research Symposium, Chengdu, China, 15 April, 2024.

Personal contribution.

All of the positions to be defended were realized by author. Additionally, the author realized the program realization of simulation algorithms.

Structure of the work.

The dissertation consists of introduction, 4 chapters, conclusion and 3 appendices. The total size of the dissertation is 100 pages, including 26 figures and 9 tables. List of references consists of 97 references.

Похожие диссертационные работы по специальности «Другие cпециальности», 00.00.00 шифр ВАК

Заключение диссертации по теме «Другие cпециальности», Масловский Александр Юрьевич

Conclusion

This research rigorously examined various optimization strategies for computational graphs that simulate digital pre-distortion in modulated signals. The study involved testing a range of full-gradient techniques alongside stochastic algorithms, providing an in-depth evaluation of their respective strengths and weaknesses.

The Adam optimizer stands out among stochastic algorithms for its efficiency in online training contexts. However, achieving optimal performance requires extensive pre-calculations to identify the best step lengths and batch sizes. This trade-off between adaptability and computational demand is crucial for real-time applications. In contrast, the L-BFGS algorithm consistently demonstrates superior results across all experiments, emerging as the clear leader. An unexpected but significant finding was its optimal memory depth, ranging from 800 to 1,000 iterations, an insight that reinforces its previously established efficacy in DPD tasks. The algorithm offers not only the fastest convergence rates, but also results in the lowest validation set errors around 0.05 dB, suggesting a robust performance against overfitting[92],[93].

Additionally, the study introduced innovative modifications to the Gauss-Newton method, particularly the Method of Stochastic Squares, proposed by Yu.E. Nesterov. This method showcased substantial improvements in practical efficiency over traditional Gauss-Newton methods, establishing itself as the most effective among local optimization strategies evaluated.

An extensive series of experiments were conducted to analyze the impact of varying training sample sizes on model performance. Remarkably, utilizing only 20% of the original dataset was sufficient to achieve validation quality metrics of -38 dB. This finding highlights the efficiency of data usage, demonstrating that even with just 5% of the dataset, a threshold of -37 dB could be reached in a significantly shorter timeframe. Such insights are particularly relevant in scenarios where computational resources are limited or where rapid training is essential[94].

The study also explored the application of these optimization techniques to a closely related task involving mobile devices based on the Wiener-Hammerstein structure. The results indicated that the L-BFGS method maintained performance levels comparable to those of least-squares solutions, reinforcing its versatility across different model architectures. This adaptability is crucial for implementing effective DPD solutions in real-world applications.

In a further exploration of neural network architectures, the research examined low-complexity networks that integrate attention mechanisms to address the vanishing gradient problem common in recurrent neural networks (RNNs). By employing a custom attention strategy - referred to as the temporal memory approach - these models effectively captured both short-term and long-term memory dynamics in radio frequency power amplifiers (RF-PAs)[63].Empirical evidence demonstrated that this approach significantly outperformed traditional RNN variants such as GRU and LSTM models, while also reducing computational complexity.

The investigation extended to multi-criteria optimization algorithms for synthesizing digital pre-distortion models. Tree-Structured Parzen Estimator (TPE) and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) were highlighted as the most suitable algorithms for this

purpose. TPE, which employs a closed-loop Bayesian optimization process, showed superior median convergence curves in various simulations. CMA-ES also showed outstanding performance in fully converging model parameters. Both approaches were compared to traditional greedy algorithms, such as grid search and Non-Dominated Sorting Genetic Algorithm II, which were effective but computationally intensive[91]. Crucially, the study addresses the issue of model complexity, emphasizing the need for balance to avoid over-simplification or excessive complexity. Simple models may fail to capture the intricacies of data, while overly complex models can lead to inefficient resource use and high computational costs during training. Algorithms identified throughout this research provide practical solutions for optimizing performance while minimizing resource expenditure.

Overall, this research not only advances our understanding of the DPD system optimization, but also lays the groundwork for future exploration in this field. Findings suggest that adaptive and low-complexity strategies significantly enhance the efficiency and accuracy of digital signal processing applications. This provides a valuable reference point for both theoretical and practical implementations. The insights gained from this work have implications that extend beyond the scope of DPD, potentially impacting a range of applications in the fields of signal processing and machine learning.

Список литературы диссертационного исследования кандидат наук Масловский Александр Юрьевич, 2024 год

References

1. Briffa, M. A. Linearization of RF power amplifiers : PhD thesis / Briffa Mark A. — Victoria University, 1996.

2. Ghannouchi, F. M. Behavioral modeling and predistortion of wideband wireless transmitters / F. M. Ghannouchi, O. Hammi, M. Helaoui. — John Wiley & Sons, 2015.

3. Haykin, S. S. Adaptive filter theory / S. S. Haykin. — Pearson Education India, 2008.

4. Analysis of oscillator phase-noise effects on self-interference cancellation in full-duplex OFDM radio transceivers / V. Syrjala [et al.] // IEEE Transactions on Wireless Communications. — 2014. — Vol. 13, no. 6. — P. 2977—2990.

5. Tehrani, A. S. Behavioral modeling of wireless transmitters for distortion mitigation / A. S. Tehrani. — Chalmers University of Technology, Gothenburg, Sweden : Chalmers Reproservice, 2012.

6. Non-convex Optimization in Digital Pre-distortion of the Signal / A. Maslovskiy [et al.] // Mathematical Optimization Theory and Operations Research: Recent Trends / ed. by

A. Strekalovsky [et al.]. — Cham : Springer International Publishing, 2021. — P. 54—70.

7. Nemirovsky, A. Problem Complexity and Method Efficiency in Optimization / A. Nemirovsky,

D. Yudin. — J. Wiley & Sons, New York, 1983.

8. Gasnikov, A. Universal gradient descent / A. Gasnikov // arXiv preprint arXiv:1711.00394. — 2017.

9. Nesterov, Y. How to make the gradients small / Y. Nesterov // Optima. — 2012. — Vol. 88. — P. 10—11.

10. Torn, A. Global optimization. Vol. 350 / A. Torn, A. vZilinskas. — Springer, 1989.

11. Zhigljavsky, A. A. Theory of global random search. Vol. 65 / A. A. Zhigljavsky. — Springer Science & Business Media, 2012.

12. Polyak, B. T. Some methods of speeding up the convergence of iteration methods /

B. T. Polyak // USSR Computational Mathematics and Mathematical Physics. — 1964. — Vol. 4, no. 5. — P. 1—17.

13. Muehlebach, M. Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives / M. Muehlebach, M. I. Jordan // arXiv preprint arXiv:2002.12493. — 2020.

14. Hestenes, M. R. Methods of Conjugate Gradients for Solving Linear Systems / M. R. Hestenes,

E. Stiefel // Journal of Research of the National Bureau of Standards. — 1952. — Vol. 49, no. 6. — P. 409—436.

15. Fletcher, R. Function minimization by conjugate gradients / R. Fletcher, C. M. Reeves // The Computer Journal. — 1964. — Vol. 7, no. 2. — P. 149—154.

16. Поляк, Б. Т. Метод сопряжённых градиентов в задачах на экстремум / Б. Т. Поляк // Ж. вычисл. матем. и матем. физ. — 1969. — Т. 9, № 4. — С. 807—821.

17. Polak, E. Note sur la convergence de méthodes de directions conjuguées / E. Polak, G. Ribière // Franccaise d'Informatique et de Recherche Opérationnelle. — 1969. — No. 16. — P. 35—43.

18. Powell, M. J. D. Restart procedures for the conjugate gradient method / M. J. D. Powell // Mathematical programming. — 1977. — Vol. 12, no. 1. — P. 241—254.

19. Neumaier, A. On convergence and restart conditions for a nonlinear conjugate gradient method / A. Neumaier. — 1997. — Institut fur Mathematik, Universitat Wien, preprint.

20. Dai, Y.-H. On restart procedures for the conjugate gradient method / Y.-H. Dai, L.-Z. Liao, D. Li // Numerical Algorithms. — 2004. — Vol. 35, no. 2—4. — P. 249—260.

21. Andrei, N. Open Problems in Nonlinear Conjugate Gradient Algorithms for Unconstrained Optimization / N. Andrei // Bulletin of the Malaysian Mathematical Sciences Society. — 2011. — Vol. 34, no. 2. — URL: http://math.usm.my/bulletin/pdf/v34n2/v34n2p11.pdf.

22. Andrei, N. 40 Conjugate gradient algorithms for unconstrianed optimization. A survey on their definition / N. Andrei // ICI Technical Report. — 2008. — No. 13. — P. 1—13.

23. Andrei, N. Another Conjugate Gradient Algorithm with Guaranteed Descent and Conjugacy Conditions for Large-scale Unconstrained Optimization / N. Andrei // Journal of Optimization Theory and Applications. — 2013. — Vol. 159, no. 1. — P. 159—182.

24. Andrei, N. A new three-term conjugate gradient algorithm for unconstrained optimization / N. Andrei // Numerical Algorithms. — 2015. — Vol. 68, no. 2. — P. 305—321.

25. Нестеров, Ю. Е. Эффективные методы в нелинейном программировании / Ю. Е. Нестеров. — М. : Радио и связь, 1989. — 304 с.

26. Нестеров, Ю. Е. Метод минимизации выпуклых функций со скоростью сходимости 0(1/к2) / Ю. Е. Нестеров // Докл. АН СССР. — 1983. — Т. 269, № 3. — С. 543—547.

27. Nesterov, Y. Introductory Lectures on Convex Optimization. Vol. 87 / Y. Nesterov. — Boston, MA : Springer US, 2004. — (Applied Optimization).

28. Nesterov, Y. Gradient methods for minimizing composite functions / Y. Nesterov // Mathematical Programming. — 2012. — Vol. 140, no. 1. — P. 125—161.

29. Nesterov, Y. Smooth minimization of non-smooth functions / Y. Nesterov // Mathematical Programming. — 2005. — Vol. 103, no. 1. — P. 127—152.

30. Гасников, А. В. Универсальный метод для задач стохастической композитной оптимизации / А. В. Гасников, Ю. Е. Нестеров. — 2016. — arXiv: 1411.4218.

31. Горнов, А. Ю. Вычислительные технологии решения задач оптимального управления / А. Ю. Горнов. — Новосибирск : Наука, 2009. — 278 с.

32. Nocedal, J. Updating quasi-Newton matrices with limited storage / J. Nocedal // Mathematics of computation. — 1980. — Vol. 35, no. 151. — P. 773—782.

33. Dennis Jr, J. E. Quasi-Newton methods, motivation and theory / J. E. Dennis Jr, J. J. Moré // SIAM review. — 1977. — Vol. 19, no. 1. — P. 46—89.

34. Liu, D. C. On the limited memory BFGS method for large scale optimization / D. C. Liu, J. Nocedal // Mathematical programming. — 1989. — Vol. 45, no. 1. — P. 503—528.

35. Skajaa, A. Limited memory BFGS for nonsmooth optimization / A. Skajaa // Master's thesis. — 2010.

36. Polyak, B. T. Minimization of Unsmooth Functionals / B. T. Polyak // USSR Computational Mathematics and Mathematical Physics. — 1969. — Vol. 9, issue 3. — P. 14—29.

37. Barzilai, J. Two-Point Step Size Gradient Methods / J. Barzilai, J. M. Borwein // IMA Journal of Numerical Analysis. — 1988. — Vol. 8. — P. 141—148.

38. Neculai, A. Conjugate Gradient Algorithms for Unconstrained Optimization. A Survey on Their Definition / A. Neculai // ICI Technical Report. — 2008. — Vol. 13. — P. 1—13.

39. Nesterov, Y. Flexible Modification of Gauss-Newton Method / Y. Nesterov. — 2021.

40. Yudin, N. Flexible Modification of Gauss-Newton Method and Its Stochastic Extension / N. Yudin, A. Gasnikov // arXiv preprint arXiv:2102.00810. — 2021.

41. Marquardt, D. W. An algorithm for least-squares estimation of nonlinear parameters / D. W. Marquardt // Journal of the society for Industrial and Applied Mathematics. — 1963. — Vol. 11, no. 2. — P. 431—441.

42. On automatic differentiation / A. Griewank [et al.] // Mathematical Programming: recent developments and applications. — 1989. — Vol. 6, no. 6. — P. 83—107.

43. Nocedal, J. Numerical optimization / J. Nocedal, S. Wright. — Springer Science & Business Media, 2006.

44. Ruder, S. An overview of gradient descent optimization algorithms / S. Ruder // arXiv preprint arXiv:1609.04747. — 2016.

45. Lower bounds for non-convex stochastic optimization / Y. Arjevani [et al.] // arXiv preprint arXiv:1912.02365. — 2019.

46. On the Convergence of Adam and Adagrad / A. Défossez [et al.] // arXiv preprint arXiv:2003.02395. — 2020.

47. Yun, J. A General Family of Stochastic Proximal Gradient Methods for Deep Learning / J. Yun, A. C. Lozano, E. Yang // arXiv preprint arXiv:2007.07484. — 2020.

48. On the importance of initialization and momentum in deep learning / I. Sutskever [et al.] // International conference on machine learning. — PMLR. 2013. — P. 1139—1147.

49. Duchi, J. Adaptive subgradient methods for online learning and stochastic optimization / J. Duchi, E. Hazan, Y. Singer // Journal of Machine Learning Research. — 2011. — Vol. 12, Jul. — P. 2121—2159.

50. Kingma, D. Adam: a method for stochastic optimization / D. Kingma, J. Ba // ICLR. — 2015.

51. Why ADAM beats SGD for attention models / J. Zhang [et al.] // arXiv preprint arXiv:1912.03194. — 2019.

52. Goodfellow, I. Deep learning / I. Goodfellow, Y. Bengio, A. Courville. — MIT press, 2016.

53. Gorbunov, E. Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping / E. Gorbunov, M. Danilova, A. Gasnikov // arXiv preprint arXiv:2005.10785. — 2020.

54. Mikolov, T. Statistical language models based on neural networks / T. Mikolov // Presentation at Google, Mountain View, 2nd April. — 2012. — Vol. 80.

55. Pascanu, R. On the difficulty of training recurrent neural networks / R. Pascanu, T. Mikolov, Y. Bengio // International conference on machine learning. — 2013. — P. 1310—1318.

56. Usmanova, I. Robust solutions to stochastic optimization problems / I. Usmanova // Master Thesis (MSIAM); Institut Polytechnique de Grenoble ENSIMAG, Laboratoire Jean Kuntz-mann. — 2017.

57. Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity / J. Zhang [et al.] // International Conference on Learning Representations. — 2020.

58. Hazan, E. Beyond convexity: Stochastic quasi-convex optimization / E. Hazan, K. Levy, S. Shalev-Shwartz // Advances in Neural Information Processing Systems. — 2015. — P. 1594—1602.

59. Levy, K. Y. The power of normalization: Faster evasion of saddle points / K. Y. Levy // arXiv preprint arXiv:1611.04831. — 2016.

60. Amir, I. SGD Generalizes Better Than GD (And Regularization Doesn't Help) / I. Amir, T. Koren, R. Livni // arXiv preprint arXiv:2102.01117. — 2021.

61. Bao, J. L. Restarted LBFGS Algorithm for Power Amplifier Predistortion / J. L. Bao, R. X. Zhu, H. X. Yuan // Applied Mechanics and Materials. Vol. 336. — Trans Tech Publ. 2013. — P. 1871—1876.

62. Luong, M.-T. Effective approaches to attention-based neural machine translation / M.-T. Lu-ong, H. Pham, C. D. Manning // arXiv preprint arXiv:1508.04025. — 2015.

63. Maslovskiy, A. Application of Attention Technique for Digital Pre-distortion / A. Maslovskiy, A. Kunitsyn, A. Gasnikov // International Conference on Optimization and Applications. — Springer. 2022. — P. 168—182.

64. Efficient BackProp / Y. LeCun [et al.] // Neural Networks: Tricks of the Trade / ed. by

G. B. Orr, K.-R. Müller. — Berlin, Heidelberg : Springer Berlin Heidelberg, 1998. — P. 9—50.

65. An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation /

H. Larochelle [et al.] // Proceedings of the 24th International Conference on Machine Learning. — Corvalis, Oregon, USA : Association for Computing Machinery, 2007. — P. 473—480. — (ICML '07). — URL: https://doi.org/10.1145/1273496.1273556.

66. Bergstra, J. Random Search for Hyper-Parameter Optimization / J. Bergstra, Y. Bengio // J. Mach. Learn. Res. — 2012. — Feb. — Vol. 13, null. — P. 281—305.

67. Mockus, J. The Application of Bayesian Methods for Seeking the Extremum / J. Mockus, V. Tiesis, A. Zilinskas // Towards Global Optimization. — 1978. — Vol. 2, no. 117—129. — P. 2.

68. Monte Carlo simulation method for behavior analysis of an autonomous underwater vehicle / J. Enayati [et al.] // Proceedings of the Institution of Mechanical Engineers, Part M: Journal of Engineering for the Maritime Environment. — 2016. — Vol. 230, no. 3. — P. 481—490.

69. Ortiz, S. Autonomous navigation in unknown environments using robust SLAM / S. Ortiz, W. Yu, X. Li // IECON 2019 - 45th Annual Conference of the IEEE Industrial Electronics Society. Vol. 1. — 2019. — P. 5590—5595.

70. Martino, L. Compressed Monte Carlo with application in particle filtering / L. Martino, V. Elvira // Information Sciences. — 2021. — Vol. 553. — P. 331—352. — URL: https: //www.sciencedirect.com/science/article/pii/S0020025520310124.

71. Monte Carlo Analysis as a Trajectory Design Driver for the TESS Mission / C. Nickel [et al.] //. — 2016.

72. MonteCarlo Techniques in Thermal Analysis - Design Margins Determination Using Reduced Models and Experimental Data // SAE Transactions. — 2006. — Vol. 115. — P. 304—311. — (Visited on 02/25/2023).

73. Snoek, J. Practical Bayesian Optimization of Machine Learning Algorithms / J. Snoek, H. Larochelle, R. P. Adams // Advances in Neural Information Processing Systems. Vol. 25 / ed. by F. Pereira [et al.]. — Curran Associates, Inc., 2012. — URL: https://proceedings. neurips.cc/paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf.

74. Rakotoarison, H. Automated Machine Learning with Monte-Carlo Tree Search / H. Rako-toarison, M. Schoenauer, M. Sebag // Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. — International Joint Conferences on Artificial Intelligence Organization, 07/2019. — P. 3296—3303. — URL: https://doi.org/10.24963/ ijcai.2019/457.

75. A Gradient Based Strategy for Hamiltonian Monte Carlo Hyperparameter Optimization / A. Campbell [et al.] // Proceedings of the 38th International Conference on Machine Learning. Vol. 139 / ed. by M. Meila, T. Zhang. — PMLR, 18-24 Jul/2021. — P. 1238—1248. —

(Proceedings of Machine Learning Research).

76. Algorithms for Hyper-Parameter Optimization / J. Bergstra [et al.] // Advances in Neural Information Processing Systems. Vol. 24 / ed. by J. Shawe-Taylor [et al.]. — Curran Associates, Inc., 2011.

77. Multiobjective Tree-Structured Parzen Estimator / Y. Ozaki [et al.] //J. Artif. Int. Res. — 2022. — May. — Vol. 73.

78. Auger, A. Tutorial CMA-ES: Evolution Strategies and Covariance Matrix Adaptation / A. Auger, N. Hansen // Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation. — Philadelphia, Pennsylvania, USA : Association for Computing Machinery, 2012. — P. 827—848. — (GECCO '12). — URL: https://doi.org/10.1145/ 2330784.2330919.

79. Larrañaga, P. A Review on Estimation of Distribution Algorithms / P. Larrañaga // Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation / ed. by P. Larrañaga, J. A. Lozano. — Boston, MA : Springer US, 2002. — P. 57—100.

80. Genetic algorithms for hyperparameter optimization in predictive business process monitoring / C. Di Francescomarino [et al.] // Information Systems. — 2018. — Vol. 74. — P. 67—83. — URL: https://www.sciencedirect.com/science/article/pii/S0306437916305695 ; Information Systems Engineering: selected papers from CAiSE 2016.

81. Sheta, A. Genetic Algorithms: A tool for image segmentation / A. Sheta, M. Braik, S. Aljah-dali //. — 05/2012. — P. 84—90.

82. Alibrahim, H. Hyperparameter Optimization: Comparing Genetic Algorithm against Grid Search and Bayesian Optimization / H. Alibrahim, S. A. Ludwig // 2021 IEEE Congress on Evolutionary Computation (CEC). — 2021. — P. 1551—1559.

83. Belyaev, A. Hybrid control algorithm based on LQR and genetic algorithm for active support weight compensation system / A. Belyaev, O. Sumenkov // IFAC-PapersOnLine. — 2021. — Vol. 54, no. 13. — P. 431—436. — 20th IFAC Conference on Technology, Culture, and International Stability TECIS 2021.

84. Evolutionary algorithms for hyperparameter optimization in machine learning for application in high energy physics / L. Tani [et al.] // The European Physical Journal C. — 2021. — Vol. 81, no. 2. — P. 67—83.

85. A fast and elitist multiobjective genetic algorithm: NSGA-II / K. Deb [et al.] // IEEE Transactions on Evolutionary Computation. — 2002. — Vol. 6, no. 2. — P. 182—197.

86. Multi-objective hyperparameter tuning and feature selection using filter ensembles / M. Binder [et al.] // Proceedings of the 2020 Genetic and Evolutionary Computation Conference. — 2019.

87. Morales-Hernández, A. A survey on multi-objective hyperparameter optimization algorithms for machine learning / A. Morales-Hernández, I. Van Nieuwenhuyse // Artificial Intelligence

Review. — 2022.

88. Ozaki, Y. Hyperparameter Optimization Method in Machine Learning: Overview and Features / Y. Ozaki, M. Nomura, M. Onishi // IEICE Transactions on Information and Systems. — 2020. — Vol. J.103—D, no. 9. — P. 615—631.

89. Shekhar, S. A Comparative study of Hyper-Parameter Optimization Tools / S. Shekhar, A. Bansode, A. Salim // 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE). — 2021. — P. 1—6.

90. A comparison of optimisation algorithms for high-dimensional particle and astrophysics applications / C. Balazs [et al.] // Journal of High Energy Physics. — 2021. — May. — Vol. 2021, no. 5.

91. Application of discrete multicriteria optimization methods for the digital predistortion model design / A. Y. Maslovskii [et al.] //. — 2023. — Vol. 15, no. 2. — P. 281—300.

92. Non-convex Optimization in Digital Pre-distortion of the Signal / A. Maslovskiy [et al.] // International Conference on Mathematical Optimization Theory and Operations Research. — Springer. 2021. — P. 54—70.

93. Brief Research of Traditional and AI-based Models for IMD2 Cancellation / A. A. Degtyarev [et al.] // 2024 Photonics Electromagnetics Research Symposium (PIERS). — 2024. — P. 1—7.

94. Non-convex optimization in digital pre-distortion of the signal / D. Pasechnyuk [et al.] // arXiv preprint arXiv:2103.10552. — 2021.

95. Liu, F. Adaptive Parallel Householder Bidiagonalization / F. Liu, F. Seinstra //. — 08/2009. — P. 821—833.

96. Hnvetynkova, I. The regularizing effect of the Golub-Kahan iterative bidiagonalization and revealing the noise level in the data / I. Hnvetynkova, M. Plevsinger, Z. Strakovs // BIT Numerical Mathematics. — 2009. — Vol. 49, no. 4. — P. 669—696.

97. Ward, R. AdaGrad stepsizes: sharp convergence over nonconvex landscapes / R. Ward, X. Wu, L. Bottou // International Conference on Machine Learning. — 2019. — P. 6677—6686.

Обратите внимание, представленные выше научные тексты размещены для ознакомления и получены посредством распознавания оригинальных текстов диссертаций (OCR). В связи с чем, в них могут содержаться ошибки, связанные с несовершенством алгоритмов распознавания. В PDF файлах диссертаций и авторефератов, которые мы доставляем, подобных ошибок нет.