天天干天天操天天爱-天天干天天操天天操-天天干天天操天天插-天天干天天操天天干-天天干天天操天天摸

課程目錄: 基于函數逼近的預測與控制培訓
4401 人關注
(78637/99817)
課程大綱:

    基于函數逼近的預測與控制培訓

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

主站蜘蛛池模板: 国产精品国产亚洲精品不卡 | 国产一区二区三区精品视频 | 国产精品爱久久久久久久电影 | 久久久国产精品福利免费 | 九一国产在线观看 | 国产精品系列在线一区 | 国产精品久久久久9999小说 | 人人草人人澡 | 韩国日本美国免费毛片 | 亚洲日本乱码中文在线电影亚洲 | 国产xxxx做受欧美88xx00tube | 草草在线观看视频 | 高清在线精品一区二区 | 色综合久久久久综合99 | 久久久这里只有精品加勒比 | 精品国产亚洲一区二区在线3d | 成人网免费看 | 97在线视频99播放 | 国产精品jlzz视频 | 国产精品视频色拍拍 | 青青草在线视频免费观看 | 亚洲成人在线播放视频 | 亚洲欧美日韩在线 | 亚洲一级大片 | 亚洲二区在线视频 | 亚洲欧美在线观看首页 | 日韩欧美视频一区 | 国产精品免费精品自在线观看 | 亚洲香蕉在线观看 | 国产激情视频在线观看首页 | 尤物国产在线精品福利一区 | 在线高清一级欧美精品 | 久久毛片免费看一区二区三区 | 国产在线精品一区二区不卡 | 成人自拍视频 | 欧美区国产区 | 性视频网站视频免费 | 精品欧美一区二区在线观看欧美熟 | 成人黄色毛片 | 国产成人剧情 | 成人人免费夜夜视频观看 |