## Finite State Markovian Decision Processes |

### From inside the book

Results 1-3 of 22

Page 14

Vn(R, i, hn_ ,) = ERI f W, | Yn = i, Wn_ , = *. U»» = Z D/(A»-1, 0w,.a + Z Z A/U.-,, 0«

y(fl) a J « xERl £ ^(|yn+1=;, y, U=n+l = a, 1 Z 0/(*,,-1, owta + Z «,»^ J The first

inequality follows from Lemma 1 and the last equation follows from the induction

assumption. The right-hand side is again independent of #„_1. Hence, Vn(R*, i, /;„

_1) = Vn*(i), ieI. This proves the theorem.

V*+ ^j)}, / e /, n = 0, a 1 T. Proof: The equations follow from the fact that Vn*(i) = Vn

(R*, ...

Vn(R, i, hn_ ,) = ERI f W, | Yn = i, Wn_ , = *. U»» = Z D/(A»-1, 0w,.a + Z Z A/U.-,, 0«

y(fl) a J « xERl £ ^(|yn+1=;, y, U=n+l = a, 1 Z 0/(*,,-1, owta + Z «,»^ J The first

inequality follows from Lemma 1 and the last equation follows from the induction

assumption. The right-hand side is again independent of #„_1. Hence, Vn(R*, i, /;„

_1) = Vn*(i), ieI. This proves the theorem.

**COROLLARY**1 . V*(i) = min{wia + qij(a)V*+ ^j)}, / e /, n = 0, a 1 T. Proof: The equations follow from the fact that Vn*(i) = Vn

(R*, ...

Page 44

Proof: It follows from the complementary slackness property of primal and dual

linear programming problems (Theorem 5, Appendix C) that if {vj , j e 1} is optimal

for the primal problem, then « I <lij(a)Vj for those values of / and a where xia > 0.

However, we have seen in the proof of Theorem 3 that for each / el, xia > 0 for

some a. Therefore, if we consider the constraints of the primal problem, (3.2) must

...

**COROLLARY**3. For every optimal solution to the primal problem, (3.2) must hold.Proof: It follows from the complementary slackness property of primal and dual

linear programming problems (Theorem 5, Appendix C) that if {vj , j e 1} is optimal

for the primal problem, then « I <lij(a)Vj for those values of / and a where xia > 0.

However, we have seen in the proof of Theorem 3 that for each / el, xia > 0 for

some a. Therefore, if we consider the constraints of the primal problem, (3.2) must

...

Page 93

there exists an R e CM such that HR(i) = HR(i). Proof: Since XTK(i) = XTR(i) for

every rif R is the policy constructed in Theorem 1, the

significance of Theorem 1 and its

involving expected frequencies of state and decision in its cost criterion and

constraints, only policies in CM need be considered. That is, if /?1 (say) is any

other policy that ...

**COROLLARY**1. Let HR(i) be the set of limit points of (XTK(i), T = 0, 1, . . .} ; thenthere exists an R e CM such that HR(i) = HR(i). Proof: Since XTK(i) = XTR(i) for

every rif R is the policy constructed in Theorem 1, the

**corollary**is evident. Thesignificance of Theorem 1 and its

**corollary**is that for any optimization probleminvolving expected frequencies of state and decision in its cost criterion and

constraints, only policies in CM need be considered. That is, if /?1 (say) is any

other policy that ...

### What people are saying - Write a review

We haven't found any reviews in the usual places.

### Contents

Finite Horizon Expected Cost Minimization | 11 |

Some Existence Theorems | 19 |

Expected Average Cost Problem | 25 |

Copyright | |

14 other sections not shown

### Other editions - View all

### Common terms and phrases

AOQL Appendix aR(i arbitrary assume backward induction Bellman Bibliographical Remarks Chapter compact compact spaces computational constraints continuous function converges convex set Corollary defined denote Derman discounted cost criterion dual linear programming dual problem dynamic programming equations example expected average cost expected cost expected discounted cost extreme point feasible solution finite number follows HD(i Hence HR(i Hs(i inspection laws of motion Lemma lim inf linear programming problem Markov chain Markovian decision process matrix maximize method of successive nondecreasing nonnegative optimal first-passage problem optimal policy optimal solution optimal stopping policy improvement iteration policy improvement procedure policy in CD primal problem random variables ReC Proof satisfies solving stochastic process strict inequality holding successive approximations super-regular function Suppose takes action theorem is proved transient transition probabilities traveling salesman problem Veinott vn(i Vn(R XTR(i