簡單記錄一下關於仿真優化的一些知識點和思考。主要基於：Handbook of Simulation Optimization, Michael Fu

Table of Contents

Overview

這是本書的overview 實際上也可以看做是這一field的overview.

SimuOpt : optimize, when the obj function cannot be computed directly, but can be simulated, with noise (focus on stochastic simulation environment).

一種分類方式：Discrete vs Continuous

Discrete Optimization
- Solution space is small -> Ranking & Selection (based on statistics or simulation budget allocation)
- Solution space is large be finite -> Ordinal Optimization (no need to estimate accurately every candidate, only need to know their order. Much faster convergence (exponential))
- Solution space is countably infinite -> Random Search (globally or locally convergent)
Continuous Opt
- RSM (Response Surface Methodology). Also has constraint considerations and robust variants
- Stochastic Approximation (RM, KW, simutaneous perturbation stochastic approximation for high-dim pbs)
- SAA (Sample Average Approximation) with consideration on stochastic constraints.
- Random Search, focus on estimation and on the search procedure. Model-based RS is newer class, assuming probability matrix is known.

Since stochasticity is the keyword, some base knowledge is important for DO as well as for CO.

Statistics
- How to estimate a solution
- How to know soluiton x is better than y
- How to know to what extent we are covering the optimal solution in the search
- How many replications de we need...
- Hypothesis testing
Stochastic constraints
Variance reduction
...

Discrete Optimization

Three fundamental type of errors:

The optimial solution is never simulated (about search)
The opt that was simulated is not selected (about estimation)
The one selected is not well estimated (about estimation)

Optimality Conditions

are needed to 1) ensure the correctness of the algo; 2) define the stopping criteria
for constrain free non-linear optimization, we stop at a settle point
for integer optimization, we check the gap between LB and UB
here for SBO, it's difficult because:
- the cost of solution g(x) can only be estimated
- no structural info can be used to prune solution zone
- complete enumeration of the solution space is often computationally intractable

Different scenarios depending on the solution space size:

Small. Less than hundreds of candidate. The key is then how to well estimate all solutions and return the best. Practically we analyze the Probability of Selection Correctness. (PSC). Algo stops til $P(x^* \in \Theta^*) \geq 1-\alpha$ where x* is the selected best solution.
Large.
- Impossible to simulation all candidates. The idea is then to find a "good enough" solution, which means that x* is among the t-best solutions, with a certain probability. $P(|T\cap X|\geq 1) \geq 1-\alpha$ . This is used in ordinal optimization
- Or, choose methods with globally convergence ( $\lim_{m \rightarrow \infty}P (x^* \in \Theta^*) \rightarrow 1$ ) or locally convergence ( $\lim_{m \rightarrow \infty}P (x^* \in \mathcal L) \rightarrow 1$ ) guarantee. $\mathcal{L}$ is the set of all local optimums depending on the definition of neighborhood structure. Local optimum can be tested statistically by controling the type1 and type2 error. Because a neighborhood is often not large.
- Hypothesis testing: if the hypothesis is right, what's the probabilty of our observation? This is a proof by contradiction, emphasizing the rejection instead of the acceptance.
- （Meta）heuristics often found in commertial solvers. These algorithms work well for difficult deterministic integer programs, and they are somewhat tolerant of sampling variabilities. However, they typically do not satisfy any optimality conditions for DOvS problems and may be misled by sampling variabilities.

Ranking and Selection

Two formulations are concerned:

indifferent zone formulation (IZF)
bayesian formulation (BF)

IZF (Frequentist)

Assume $g(x_1)+\delta=g(x_2)=g(x_3)...$ which is the most difficult case. The objective is to find x1 which is at least $\delta$ -better than all the others.

Bachhofer's procedure: assume estimation variance $\sigma_1^2=\sigma_2^2=...$ , Bachhofer decides the number of replications to estimate each solution. Then it suffices to chooses the best one based on the sample mean.
Paulson's procedure: filter progressively. At each iteration: take one observation of each solution, calculate the sample mean, and filter out some bad solutions. This is more efficient than Bachhofer's since a large number of solutions may be filtered out at early stages.
Gupta's procedure (subset selection): similar to 1 and 2. Returning a set of solutions and guarantee that $P(x_1 \in I) \geq 1-\alpha$

Based on the principle of the above 3 procedures, further procedures include:

NSGS: a two-stage procedure. Compute a initial sample mean, then according to the variance of estimations, decide the amount of extra replications to make. Finally select the best.
KN: contrast to NSGS, this is not a two-stage procedure but a iterative one, adding replications progressively.

BF (Bayesian) (Does not provide PSC guarantees)

Used when prior information is available.

Helps to choose the next solution to explore, based on prior information and previous sample results, and also the simulation budgets. This involves a MDP problem and can be possibly solved by ADP/RL.

Generic Bayes Procedure: basicly a RL procedure: Simulation (state)->Choose the next solution (action)->loop
Since it's hard to find the optimal Actor, some heuristics are proposed:
1. OCBA (Optimal Computing Budget Allocation)
2. EVI (Expected Value of Information)
3. KG (Knowledge Gradient)

Conclusion

Brankle et al. found that no R&S procedure is dominent in all situations. (thousands of pb structures tested). BF is often more efficient in terms of nb of samples, but it doesn't provide correct-selection optimality guarantee like frequentist does.

Ordinal Optimization (OO)

When the solution space is large, OO proposes "sofe optimization", which selects a subset S from $\Theta$ and limit the analysis to S. We are interested in the probability that $P(|T\cap X|\geq l) \geq 1-\alpha$ , where T is the set of top t solutions in the whole space. is called the alignment level and the probability is alignment probability.

Two basic idea behind OO:

Estimating the order between solutions is much easier than estimating obj values
Acception good enough solutions leads to exponential reduction in computational burden

OO is more an analysis than new algorithm, the procedure will be:

First determine the AP (alignment probability)
Then that will determine the cardinality of subset $S\subset\Theta$
Then just run R&S and you got the guarantee that the solution is among the top t.

In practical i don't think this is so interesting, since it just tells you that, the larger is, the better.

Globally Convergent Adaptive Random Search

Designed for large but finite solution space. Guarantee $\lim_{m \rightarrow \infty}P (x^* \in \Theta^*) \rightarrow 1$

Generic GCARS:

init
Sampling: $\mathcal{F}_m(\cdot|\mathcal{M}_m)$
Estimation: $\mathcal{SAR}_m(\mathcal{E}_m|\mathcal{M}_m)$
Iteration: update V(x) for x in $\mathcal{E}_m$

Several algoritms are described:

Stochastic Ruler Algo: accept a solution by uniformly choosing a ruler u~U(lb,ub)
Stochastic Branch and Bound: each time choose a partition of $\Theta$ with the minimum LB, then partition it finer and finer
Nested Partition: an enhancement of SBB with less information to memorize
R-BEESE(Balanced Explorative and Exploitative Search with estimation). On each iteration:
1. with a probability q, refine the current x* with more replications
2. else with a probability p sample from Global(theta)
3. else sample from Local(theta)

Locally Convergent Adaptive Random Search

Similar to GCARC, but with a statistical procedure to test the local optimality of .

COMPASS (Convergen Optimization via Most-Promising-Area Stochastic Search)

init, sample a neighborhood of solutions and retain the best
move to the next neighborhood by choosing $\Theta_m=\{x\in \Thea: ||x-x^*||\leq ||y-x^*||, y\in \mathcal{V}_m, y\neq x^*\}$ . In otherword, always focus on the closest neighbors of x*. Here a LP can be solved to find the neighborhood. Called constraint pruning.

AHA (Adaptive Hyperbox Algo)

Like COMPASS, but define the neighborhood as the hyperbox around x* : $\mathcal{H}_m=\{x:l_m^{(k)}\leq x_m^{(k)}\leq u_m^{(k)}, 1\leq k\leq d\}$ where d is the dimension of x.

Commercial Solvers

Most simulation modeling softwares includes SBO tool, but most of them are based on R&S or meta-heuristics like SA. Meta-heuristics have been observed to be effective on difficult deterministic optimization problems but they usually provide no performance guarantees. Some advises are:

Do preliminary tests to control sampling variability
Re-run several times the solver (multi-start with different random seed)
Estimating the final solutions set carefully to be sure to select the best.

Conclusion

Most of the above mentioned algorithms are black-box algorithms that do not depend on problem structures. This can be considered in defining the neighborhood in LCRS, for instance.

[NOTE in progress] Simulation Optimization

Overview

Discrete Optimization

Three fundamental type of errors:

Optimality Conditions

Different scenarios depending on the solution space size:

Ranking and Selection

Ordinal Optimization (OO)

Globally Convergent Adaptive Random Search

Locally Convergent Adaptive Random Search

Commercial Solvers

Conclusion

[NOTE in progress] Simulation Optimization

A Road Map for Deep Learning

Stochastic Optimization: Casual Notes

Graph Neural Network: A First Glance

Git 項目管理流程與協作方式

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結