NLopt詳細介紹

NLopt 介紹(NLopt Introduction)

在本手冊的這一章中,我們首先概述NLopt解決的優化問題,不同類型的優化算法之間的主要區別,並評論以NLopt要求的形式轉換各種問題的方法。我們還將描述NLopt的背景和目標

1 優化問題(Optimization problems)

  • NLopt解決了以下形式的一般非線性優化問題
    minxRnf(x)\min_{\mathbf{x}\in\mathbb{R}^n} f(\mathbf{x})
    其中 f 是目標函數,x表示n優化參數(也稱爲設計變量或決策參數)。
  • 此問題可以有選擇地受到邊界約束(也稱爲box約束):
    lbixiubilb_i \leq x_i \leq ub_ii=1,,ni=1,\ldots,n
    給定下限lb和上限ub(對於部分或完全無約束的問題分別爲-∞和/或+∞)。如果lbi=ubilb_i = ub_i,則該參數將被消除。
  • 一個參數也可以選擇具有m非線性不等式約束(有時稱爲非線性規劃問題):
    fci(x)0fc_i(\mathbf{x}) \leq 0i=1,,mi=1,\ldots,m
  • 對於約束函數fci(x)。一些NLopt算法還支持p個非線性等式約束:

hi(x)=0h_i(\mathbf{x}) = 0i=1,,pi=1,\ldots,p

  • 更一般地,幾個約束可以一次組合成一個返回向量值結果的函數。
  • 滿足所有邊界約束、不等式和等式約束的點x稱爲可行點,所有可行點的集合爲可行區域
  • 【注意】:在本簡介中,我們遵循通常的數學慣例,即讓我們的索引(序號)從1開始。然而,在C編程語言中,NLopt遵循C的從零開始的約定(例如,約束 i 爲從0到m -1)。

2 全局優化與局部優化(Global versus local optimization)

  • NLopt包含嘗試對目標進行全局局部優化的算法。

2.1 全局優化(Global optimization)

  • 全局優化是找到在整個可行區域內使目標fx)最小的可行點x的問題。
  • 通常,這可能是一個非常困難的問題,隨着參數數量n的增加,難度將成倍增加。
  • 實際上,除非知道有關 f 的特殊信息,否則甚至無法確定是否找到了真正的全局最優值,因爲可能會出現 f 值的突然下降,且這個值隱藏在您尚未查找的參數空間中。
  • 但是,NLopt包含幾種全局優化算法,如果維度n不太大(if the dimension n is not too large),它們可以很好且合理地處理優化問題(work well on reasonably well-behaved problems)。

2.2 局部優化(Local optimization)

  • 局部優化是一個容易得多的問題。
  • 它的目標是找到一個僅是局部最小值的可行點x:f(x)小於或等於所有附近可行點的f值(可行區域與x的至少某個小鄰域的交點(the intersection of the feasible region with at least some small neighborhood of x)不知道該怎麼翻譯,這句話也不重要)。
  • 通常,非線性優化問題可能有很多局部最小值,算法確定的最小值位置通常取決於用戶提供給算法的起點。
  • 另一方面,即使在非常高維的問題中(尤其是使用基於梯度的算法),局部優化算法通常也可以快速定位局部最小值。
  • 在某種程度上令人迷惑的是,一個算法,如果可以保證從任何可行的起點找到局部最小值,則稱爲全局收斂。

2.3 凸函數優化的問題

  • 在特殊類別的凸優化問題中,目標和不等式約束函數都是 凸函數 (並且等式約束是仿射變換,或者在任何情況下都具有凸水平集),因此只有一個局部最小值f,因此局部優化方法能找到一個全局最優解。
  • 但是,可能會有不止一個點x產生相同的最小值f(x),最優點形成(凸)可行區域的凸子集。
  • 通常,凸問題是由特殊分析形式的功能引起的,例如線性規劃問題,半定規劃,二次規劃等,並且可以使用專門的技術非常有效地解決這些問題。
  • NLopt僅包含不假定凸性的常規方法;
  • 如果您遇到了凸的問題,最好使用其他軟件包,例如Stanford 的 CVX 軟件包

3 基於梯度算法與無導數算法(Gradient-based versus derivative-free algorithms)

3.1 梯度算法

  • 尤其是對於局部優化,最高效的算法通常要求用戶提供梯度∇ ˚F來得到任何給定點X的˚F(X)值(以及類似地用於任何非線性約束)。
  • 這利用了以下事實:原則上,幾乎可以使用很少的額外計算量(最壞的情況是與第二次評估f相同)在計算f的值同時計算梯度。
  • 如果用一個快速的方法來計算的導數F不明顯,人們通常認爲計算∇ F使用了伴隨方法,或可能使用自動區分工具。
  • 基於梯度的方法對於非常高維的參數空間(例如,成千上萬個n或更大的n)的有效優化至關重要。

3.2 無導數算法

  • 另一方面,如果將目標函數作爲複雜程序提供,則計算梯度有時會很麻煩且不便。
  • 如果f是不可導的(或更糟的是不連續的),可能是不可計算的。
  • 在這種情況下,通常使用無導數算法進行優化更簡單,該算法僅要求用戶提供給定點x的函數值 f(x)。
  • 這種方法通常必須對f進行至少幾倍n點的評估(Such methods typically must evaluate f for at least several-times-n points),因此,當n比較小或中等(最多數百個)時,最好使用它們。

3.3 NLopt對於上述算法的處理

  • NLopt提供具有公共接口的無導數算法和基於梯度的算法。

3.4 注意事項

  • 如果你發現自己通過有限差分近似來計算梯度,比如(在一維中)
    f/x[f(x+Δx)f(xΔx)]/2Δx\partial f/\partial x \approx [f(x+\Delta x) - f(x -\Delta x)]/2\Delta x
    你應該使用一個無導數的算法。
  • 有限差分近似不僅昂貴(使用中心差異對梯度進行2n函數評估),而且它們也很容易受到舍入誤差的影響,除非你非常小心。
  • 在另一方面,有限差分近似是非常有用的檢查你的分析梯度計算是否正確的方式。
  • 這是一個好主意,因爲根據我的經驗,梯度代碼很容易產生bug。
  • 不正確的梯度將會引起基於梯度的優化算法的奇怪問題。

4 優化問題的等效公式(Equivalent formulations of optimization problems)

There are many equivalent ways to formulate a given optimization problem, even within the framework defined above, and finding the best formulation can be something of an art.
即使在上述定義的框架內,也存在許多等效的方法來公式化給定的優化問題,而尋找最佳公式可以被稱爲是一種藝術

  • 從一個簡單的例子開始,假設你想要找到函數g(x)的最大值。其實這可以等效爲求函數 f(x)=−g(x) 的最小值。因此,除了求最小例程外,NLopt沒必要去提供一個求最大的例程,用戶只需翻轉符號即可進行最大化。但是,爲方便起見,NLopt提供了一個最大化接口(該接口在內部爲您執行必要的符號翻轉)。
  • 一個更有趣的示例是極大極小優化問題,其中目標函數 f(x) 是N個函數的最大值:
    f(x)=max{g1(x),g2(x),,gN(x)}f(\mathbf{x}) = \max \{ g_1(\mathbf{x}), g_2(\mathbf{x}), \ldots, g_N(\mathbf{x}) \}
  • 當然,您可以將此目標函數直接傳遞給NLopt,但是這裏存在一個問題: f(x) 並非在所有地方都是可微的(假設 gk 是可微的, f(x) 只是分段可微的)。這不僅意味着最有效的基於梯度的算法不適用,而且甚至無導數算法也可能會大大減慢速度。取而代之的是,可以通過添加虛擬變量t和N個新的非線性約束(以及任何其他約束)來將相同的問題表述爲可微問題:
    minxRn,tRt\min_{x\in\mathbb{R}^n, t\in\mathbb{R}} t
    gk(x)t0,k=1,2,,Ng_k(\mathbf{x}) - t \leq 0,k=1,2,\ldots,N
  • 這完全解決了相同的最大最小問題,但是現在我們有了一個可微分的目標和約束條件。假設每個 gk 都可以被微分,注意,在這種情況下,目標函數本身就是無聊的線性函數 t,所有有趣的東西都在約束中。這是許多非線性編程問題的典型特徵。
  • 另一個例子是最小化函數 g(x) 的絕對值 g(x)|g(\mathbf{x})|,這等價於最小化max{g(x),g(x)}\max \{ g(\mathbf{x}), -g(\mathbf{x}) \}。但是,這可以像上面的minimax示例中一樣轉換爲可微分的非線性約束。

5 等式約束(Equality constraints)

  • 假設您具有一個或多個非線性等式約束
    hi(x)=0h_i(\mathbf{x}) = 0.
  • 理論上,每個等式約束可以用兩個不等式約束表示。hi(x)0h_i(\mathbf{x}) \leq 0hi(x)0-h_i(\mathbf{x}) \leq 0。因此您可能會認爲任何可以處理不等式約束的代碼都可以自動處理等式約束。但是在實踐中,這是不正確的——如果您嘗試將等式約束表示爲一對非線性不等式約束,則某些算法將無法收斂。
  • 等式約束有時需要特殊處理,因爲它們會減小可行區域的維度,而不僅僅是不等式約束的大小。當前只有某些NLopt算法(AUGLAG,COBYLA和ISRES)支持非線性等式約束。

5.1 消除(Elimination)

  • 有時,可以通過消除過程來處理相等約束:您可以使用等式約束,根據其他未知參數,顯式求解某些參數。然後將這個未知參數作爲優化參數傳入NLopt。

Sometimes, it is possible to handle equality constraints by an elimination procedure: you use the equality constraint to explicitly solve for some parameters in terms of other unknown parameters, and then only pass the latter as optimization parameters to NLopt.

  • 舉個例子,假設你有一個線性等式約束:
    Ax=bA\mathbf{x} = \mathbf{b}
  • 對於某個常數矩陣 A ,給定這些方程的一個特解 ξ 和一個矩陣 N ,其中 N 的列決定了 A零空間。則可以用以下形式表示這些線性方程的所有可能解:

for some constant matrix A. Given a particular solution ξ of these equations and a matrix N whose columns are a basis for the nullspace of A, one can express all possible solutions of these linear equations in the form:

x=ξ+Nz\mathbf{x} = \boldsymbol{\xi} + N\mathbf{z}

  • 對於未知向量 z 。隨後,您可以將 z 作爲優化參數傳遞給NLopt而不是 x ,從而消除等式約束。
  • 【注意】在矩陣 N 的數值計算零空間時需要注意,因爲舍入誤差會傾向於使矩陣 A 的奇異值比應有的少。一種技術標準就是去計算矩陣 A奇異值分解(SVD, Singular value decomposition) 並且將所有小於閾值的奇異值設置爲0

5.2 懲罰函數(Penalty functions)

  • 解決等式約束(以及不等式約束)的另一種流行方法是在目標函數中包含某種懲罰函數(又稱爲罰函數),該函數懲罰違反約束的x值。這種標準技術被稱爲增強拉格朗日方法,該方法的一種變體是在NLopt的AUGLAG算法中實現的。
  • (對於不等式約束,懲罰概念的一種變體是一種障礙方法:這只是一種懲罰,隨着接近約束而發散,這迫使優化保持在可行範圍內。)

6 終止條件(Termination conditions)

  • 對於任何優化算法,必須提供一些終止條件,以指定算法何時停止。理想情況下,當發現最佳值在某個所需的公差範圍內時,算法應停止運行。然而,實際上,由於真正的最優值是不會提前知道的,因此人們對解決方案中的誤差使用啓發式估計,而不是實際誤差。
  • NLopt爲用戶提供了幾種不同的終止條件的選擇。對於任何給定的問題,你無需指定所有的終止條件。您應該只設置所需的條件。當滿足第一個指定終止條件(即,您指定的最弱條件是最重要的條件)時,NLopt將終止。
  • NLopt支持的終止條件如下:

6.1 函數值和參數公差(Function value and parameter tolerances)

  • 首先,您可以在函數值上指定分數公差ftol_rel和絕對公差ftol_abs。理想情況下,與確切的最小函數值相比,它們將是最大分數和絕對誤差,但這是不可能的,因爲最小值未知。取而代之的是,大多數算法將其實現爲函數值從一個迭代到下一個迭代(或類似迭代)減小Δf的容差:如果|Δf| / | f | 小於ftol_rel或|Δf| 小於ftol_abs,算法就會停止。
  • 同樣,您可以在參數x上指定分數公差xtol_rel和絕對公差xtol_absi 。同樣,實際誤差 Δx與(未知的)最小值相比得到誤差是不可能的,所以在實際中Δx通常是測量x從一次迭代到下一次迭代的變化,或者是搜索區域的直徑等。然後算法停止於|Δxi| < xtol_absi 或 |Δxi|/|xi| < xtol_rel。
  • Note: generally, you can only ask for about half as many decimal places in the xtol as in the ftol. The reason is that, near the minimum, Δff(Δx)2/2\Delta f \approx f'' (\Delta x)^2 / 2 from the Taylor expansion, and so (assuming f1f'' \approx 1 for simplicity) a change in x by 10–7 gives a change in f by around 10–14. In particular, this means that it is generally hopeless to request an xtol_rel much smaller than the square root of machine precision.

In most cases, the fractional tolerance (tol_rel) is the most useful one to specify, because it is independent of any absolute scale factors or units. Absolute tolerance (tol_abs) is mainly useful if you think that the minimum function value or parameters might occur at or close to zero.

If you don’t want to use a particular tolerance termination, you can just set that tolerance to zero and it will be ignored.

Stopping function value

Another termination test that NLopt supports is that you can tell the optimization to stop when the objective function value f(x) reaches some specified value, stopval, for any feasible point x.

This termination test is especially useful when comparing algorithms for a given problem. After running one algorithm for a long time to find the minimum to the desired accuracy, you can ask how many iterations algorithms require to obtain the optimum to the same accuracy or to some better accuracy.

Bounds on function evaluations and wall-clock time

Finally, one can also set a termination condition by specifying a maximum number of function evaluations (maxeval) or a maximum wall-clock time (maxtime). That is, the simulation terminates when the number of function evaluations reaches maxeval, or when the total elapsed time exceeds some specified maxtime.

These termination conditions are useful if you want to ensure that the algorithm gives you some answer in a reasonable amount of time, even if it is not absolutely optimal, and are also useful ways to control global optimization.

Note that these are only rough maximums; a given algorithm may exceed the specified maximum time or number of function evaluations slightly.

Termination tests for global optimization

In general, deciding when to terminate a global optimization algorithm is a rather difficult problem, because there is no way to be certain (without special information about a particular f) that you have truly reached the global minimum, or even come close. You never know when there might be a much smaller value of the objective function lurking in some tiny corner of the feasible region.

Because of this, the most reasonable termination criterion for global optimization problems seems to be setting bounds on the run time. That is, set an upper bound on how long you are willing to wait for an answer, and use that as the maximum run time. Another strategy is to start with a shorter run time, and repeatedly double the run time until the answer stops changing to your satisfaction. (Although there can be no guarantee that increasing the time further won’t lead to a much better answer, there’s not much you can do about it.)

I would advise you not to use function-value (ftol) or parameter tolerances (xtol) in global optimization. I made a half-hearted attempt to implement these tests in the various global-optimization algorithms, but it doesn’t seem like there is any really satisfactory way to go about this, and I can’t claim that my choices were especially compelling.

For the MLSL algorithm, you need to set the ftol and xtol parameters of the local optimization algorithm control the tolerances of the local searches, not of the global search; you should definitely set these, lest the algorithm spend an excessive amount of time trying to run local searches to machine precision.

Background and goals of NLopt

NLopt was started because some of the students in our group needed to use an optimization algorithm for a nonlinear problem, but it wasn’t clear which algorithm would work best (or work at all). One student started by downloading one implementation from the Web, figuring out how to plug it into her Matlab program, getting it to work, only to find that it didn’t converge very quickly so she needed another one, and so on… Then another student went through the same process, only his program was in C and he needed to get the algorithms to work with that language, and he obtained a different set of algorithms. It quickly became apparent that the duplication of effort was untenable, and the considerable labor required to decipher each new subroutine, figure out how to build it, figure out how to bridge the gap from one language (e.g. Fortran) to another (e.g. Matlab or C), and so on, was so substantial that it was hard to justify trying more than one or two. Even though the first two algorithms tried might converge poorly, or might be severely limited in the types of constraints they could handle, or have other limitations, effective experimentation was impractical.

Instead, since I had some experience in wrapping C and Fortran routines and making them callable from C and Matlab and other languages, it made sense to put together a common wrapper interface for a few of the more promising of the free/open-source subroutines I could find online. Soon, it became clear that I wanted at least one decent algorithm in each major category (constrained/unconstrained, global/local, gradient-based/derivative-free, bound/nonlinear constraints), but there wasn’t always free code available. Reading the literature turned up tantalizing hints of algorithms that were claimed to be very powerful, but again had no free code. And some algorithms had free code, but only in a language like Matlab that was impractical to use in stand-alone fashion from C. So, in addition to wrapping existing code, I began to write my own implementations of various algorithms that struck my interest or seemed to fill a need.

Initially, my plan was to handle only bound constraints, and leave general nonlinear constraints to others—who needs such things? That attitude lasted until we found that we needed to solve a 10,000-dimensional minimax-type problem, which seemed intractable unless gradient-based algorithms could be brought to bear…as discussed above, this requires nonlinear constraints to make the problem differentiable. After some reading, I came across the MMA algorithm, which turned out to be easy to implement (300 lines of C), and worked beautifully (at least for my problem), so I expanded the NLopt interface to support nonlinear constraints.

Overall, I’ve found that this has been surprisingly fun. Every once in a while, I come across a new algorithm to try, and now that I’ve implemented a few algorithms and built up a certain amount of infrastructure, it is relatively easy to add new ones (much more so than when I first started out). So, I expect that NLopt will continue to grow, albeit perhaps more slowly now that it seems to include decent algorithms for a wide variety of problems.

發佈了83 篇原創文章 · 獲贊 303 · 訪問量 4萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章