Currently learning stochastic optimization (SO) theory, I will note important content here. Some book references are:
 King & Wallace (2012) Modeling with Stochastic Programming. This one focus on how to model a problem rather than how to solve it. Less math is involved, good for beginner.
 Birge, John R., and Francois Louveaux. Introduction to stochastic programming. Springer Science & Business Media, 2011.
 Shapiro, Alexander, Darinka Dentcheva, and Andrzej Ruszczyński. Lectures on stochastic programming: modeling and theory. Society for Industrial and Applied Mathematics, 2014.
Why do we need it
 The real world problem is never deterministic
 Sensitivity Analysis is not a way of handling uncertainty. It’s an analysis of deterministic decision problems.
 Parameter estimations are always wrong.
Concepts
 Stage: time points where we make decisions after learning something new. Stages must be identified first. Different models follows with different definition of stages.
 Period: the time clock. Often defined according to the occurence of random events. For example, consider a facility location problem considering random daily demands in the future. Fixing the location of facilities is onestage decision, while the random demands occur every day, so it is about multiperiod.
 Transient modeling vs Steadystate modeling. The former mean we make decision now based on what we have now. While the latter provide all possible decisions on all possible scenarios. SO is about the former one. For the later, I’d call it ‘anticipativity’, it’s treated by Stochastic DP…
 Time line: a SO modeling is not clear until the time line is drawn:
1st stage decision (here and now) > some uncertainty disclosed > 2nd stage decision (wait and see)> some uncertainty disclosed…
Stages
As i understood, the basic form of a twostage problem:
optimize $f(x) + E_{\xi}(\min y(w,x) + q(w))$
where $x$ is the firststage decision, while $w\in \xi$ is a random event. The objective is to find the optimal FIRSTSTAGE decision resulting the best expected profit/cost.

Inherently twostage: first stage is long term investment and the second stage is short term usage of this investment. Two stage, two different decision types.

Inherently multistage: many stages, same type of decision.

Two stage or multistage? It depends on the modeler. If the decision is made in a rolling horizon manner, then instead of make a multistage model, we may use twostage as a simplification. Later stages are aggregated since we do not need so much detail.

Nonanticipativity: if two scenarios are indistinguishable before time t, then the action taken at time t on each scenario must be the same.

Horizon effect in multistage SO: some multistage problem has infinite time horizon. When the horizon is long, we are approaching a steadystate, but in SO we only care about the transient decision, so we have to represent the steady state in the model in order to obtain the right transient behavior. We don’t really care what to do far in the future.
To deal with horizon effect, King & Wallace (2012) presented what they call Dual Equilibrium as a tool to consider infinite steadstage effect into the SO model. Check chapter 2.3.6.
Scenario generation
A SO is often solved in its deterministic form based on a number of scenarios generated to describe the uncertainty. To do this, firstly we must generate scenarios correctly.
It’s important to realize, that we pass from random variables to discretization because of the algorithm that we are choosing. So it’s important to ensure that the discretization is not too far from reality W.R.T the ALGORITHM!
Algorithms that do not need scenarios as input
 stochastic decomposition. Need very efficient implementation and only for linear programs.
 stochastic quasigradients
 importance sampling
Use scenarios trees as input
Two problems:
 A small number of scenarios => bad result. Quality depends on the randomness of generation.
 A great number of scenarios => too large to be solved…
Where to sample from : if you do not have reliable information on the true distribution, just use the empirical one !
What is a good discretization?
It depends on the model! Our aim is not to approximate the real distribution, instead, we want the algorithm to feel like using the real distribution.
Approach 1
 insample stability: $f(x^*_i,\Tau_i)\approx f(x^*_j,\Tau_j)$
 outofsample stability: $f(x^*_i,\xi)\approx f(x^*_j,\xi)$. The rhs can be computed by a simulation model with much more scenarios, or simply, by testing $f(x^*_i,\Tau_j)\approx f(x^*_j,\Tau_i)$
 Bias. It can be both in and out sample stable, but bad… This should be tested by statistical methods, i.e. evaluating the quality of solution, in addition to the stability.
Approach 2
Replicate the distribution by respecting some important properties like: first moment, second moment, third moment… Use some regression methods to do this or use the iterative procedure provided by King & Wallace (2012).
Approach 3
Generating scenarios by minimizing the distance between the generated one and the real one. This is called scenario reduction methods. It’s used when we somehow know the distribution but want to use minimum number of scenarios to represent it. This must integrate an optimization procedure into the scenario generation module.
Optimality gap estimators
 Use the expectation of solutions on many trees as an estimator of the expectation of solutions over the true distribution, some bound results can be derived.