What people are saying - Write a review
We haven't found any reviews in the usual places.
Discounted Dynamic Programming
Minimizing CostsNegative Dynamic Programming
Maximizing RewardsPositive Dynamic Programming
5 other sections not shown
Other editions - View all
assume average return bandit bets bounded function chooses action concave function Consider convex convex function counterexample decision decreasing order density Derman determine the optimal discounted return distribution dynamic programming earned equal exists expected cost expected makespan expected return expected reward expected total exponential finite given going broke Hardy's theorem Hence induction hypothesis initial issue item Jensen's inequality Lemma likelihood ratio linear programming machine Markov chain max[R(i maximal expected maximize the expected maximizes the probability minimizes model of Section n-stage problem nonstationary policy objective obtain optimal policy optimal value function optimality equation policy that chooses posterior probability processors Proof Proposition prove random variable remaining result follows retirement reward R(i Ross satisfies the optimality scheduling sequence sequential stage stationary policy stochastically maximizes stopping suppose tasks terminal reward Theorem 3.2 timid strategy total expected transition probabilities Vn(i