Anurag Rai and Dr. Sean Warnick, Computer Science
There are many instances in our world today where a distinct decision must be made at each stage of a process. Any decision made has both immediate and long term impacts. Hence, when making these decisions any low present cost must be balanced with the possibly high future costs. Choosing the best set of decisions, which minimizes the total cost, is the essence of sequential decision making problems.
Sequential Decision Making(SDM) problems evolve in stages and this evolution is guided by the decisions made. These problems can be modeled as:
where J(xk) is the optimal cost of transitioning from state xk to state xN. Thus, the optimal cost for the system is given by J(x0).
Computing a solution using the DP algorithm is easy for small problems, i.e. problems with a limited number of stages. This algorithm also works very well and produces polynomial time solutions for some specifi c problems like sequence alignment (DNA, protein sequence, etc.) using the NeedlemanWunsch algorithm, a variation of DP. However, DP algorithms are often intractable even for modest size problems because of the Curse of Dimensionality. So, approximation methods are used along with DP to approximate the solution.
Solving a SDM problem using DP requires the computation of all the transitions costs weighting the edges of the SDM transition graph. The transition cost, C, is a function of the state xk and the input uk. Depending on the problem, the amount of computation required to calculate this cost varies. Sometimes it is easy, such as in the case of the Traveling Salesperson Problem where the distance between all the cities can be listed in a lookup table. In this case, the computation of the transition cost becomes equivalent to the cost of accessing the table. This happens when the values of states are repeated at diff erent stages.
In other situations, the values of the states at di fferent stages may be diff erent, so every path of the transition graph must be evaluated separately. Since SDM problems in real applications can have a large number of stages, the transition cost computation erodes the tractability of the DP algorithm. To help us in those situations, we develop a method called the Cost Duplication Heuristic (CDH) which can be used to get the transition costs without evaluating every path in the transition graph, if the problems meets certain criterion (which are discussed in [1]).
This method replaces the original cost function C(xk; uk) by C(uk-t; uk-(t-1); …; uk), a function that is easy to compute, which depends on the previous input(s) and the current input, but not on the state xk. Here t is the number of previous states used in the approximation. Since the approximation uses t previous inputs, it is called a t-step approximation.
When using a t-step CDH, a simulation is used to compute the true transition costs for all the possible (t+1) length sequences of the inputs. For example, if there are two possible inputs X and Y , and a 2-step approximation is used, the simulation should generate costs for the following sets of decisions: Decisions at stage 0: (φ,φ,X), (φ, φ, Y ), Decisions at stage 1: (φ,X,X), (φ,X, Y ),(φ Y,X), (φ, Y, Y ), Decisions at stage 2 and above: (X,X,X), (X,X, Y ), (X, Y,X), (X, Y, Y ), (Y,X,X), (Y,X, Y ), (Y, Y,X), (Y, Y, Y ).
Here, the 0th and the 1st stages do not have enough previous inputs so represents an empty input. Note that this computation generates the costs for a portion of the DP transition graph, and as the length of the approximation, t, grows larger, the approximation approaches the optimal solution. This generates a map from these sequences of inputs to the transition costs which can be stored in a lookup table, and these costs can be used to approximate the cost for any sequence of inputs, e.g., the cost of the sequence (X, Y,X,X, Y, Y ) (φ, φ,X) + (φ, X, Y ) + (X, Y, X) + (Y,X,X)+(X,X, Y )+(X, Y, Y ). After getting the approximate transition costs, the dynamic pro- gramming equations 1 and 2 can be used with the new cost function, C, to approximate the solution.
References
- A. Rai, \ A Technique for Mitigating the Curse of Dimensionality in Sequential Decision Making
Problems,” BYU Honors Thesis, 2010.