Tools | Bookmark & Share | Make MrWhy My Homepage
MrWhy.com
Go
MrWhy.com » Videos » On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient
On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient
On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient
Report
On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient
Likelihood ratio policy gradient methods have been some of the most successful reinforcement learning algorithms, especially for learning on physical systems. We describe how the likelihood ratio policy gradient can be derived from an importance sampling perspective. This derivation highlights how likelihood ratio methods under-use past experience by (a) using the past experience to estimate the gradient of the expected return at the current policy parameterization, rather than to obtain a more complete estimate, and (b) using past experience under the current policy rather than using all past experience to improve the estimates. We present a new policy search method, which leverages both of these observations as well as generalized baselines - a new technique which generalizes commonly used baseline techniques for policy gradient methods. Our algorithm outperforms standard likelihood ratio policy gradient algorithms on several testbeds.
Channel: VideoLectures
Category: Educational
Video Length: 0
Date Found: March 26, 2011
Date Produced: March 25, 2011
View Count: 0
 
MrWhy.com Special Offers
1
2
3
4
5
 
About Us: About MrWhy.com | Advertise on MrWhy.com | Contact MrWhy.com | Privacy Policy | MrWhy.com Partners
Answers: Questions and Answers | Browse by Category
Comparison Shopping: Comparison Shopping | Browse by Category | Top Searches
Shop eBay: Shop eBay | Browse by Category
Shop Amazon: Shop Amazon | Browse by Category
Videos: Video Search | Browse by Category
Web Search: Web Search | Browse by Searches
Copyright © 2011 MrWhy.com. All rights reserved.