weakly supervisedd learning unweakly supervisedd learning，reinforcement learning有何

科学教育 | 学习帮助 | 出国/留学 | 工程技术科学 | 教育/科学 | 英语听力 | 梦幻西游电脑版 | 视频会议 | 口臭 | 暗黑破坏神3（游戏） | 面相 | 赛尔号 | linux | 山西省 | Xbox One | 思修 | 易经 | solidworks | 钢铁雄心4 | 休闲游戏 | 魔兽争霸3混乱之治 | 显卡 | 武汉大学 | 塞尔达传说（游戏） | 校服 | 剑侠情缘网络版叁 | 脱发 | 日本文化 | 数学建模 | 二次元 | 部落冲突（游戏） | 肖战 | 街机游戏 | 拳皇 | 马鞍山市 | 扑克 | 完美世界（游戏） | 三国志（游戏） | 热血传奇（游戏） | 意大利 | 跆拳道 | 东莞市 | 糖尿病 | 古琴 | 三国 | 电视节目 | 百度 | qq音乐 | 配音 | 电视 | 任天堂 | 科幻小说 | 虚拟专用服务器 | QQ游戏 | 大熊猫 | 微电影 | Android | 竞技游戏 | 动画制作 | QQ炫舞 | 电源 | 日语 | 魔兽争霸3冰封王座 | 产业 | ios开发 | 百度云 | 动画电影 | nba篮球 | 羽生结弦 | iOS应用 | galgame | 电吉他 | 平板电脑 | 周星驰（人物） | 离婚 | 后宫·甄嬛传（书籍） | 牙科 | 游戏开发 | 网络直播 | ios游戏 | 电子邮件 | SNH48 | 民国 | 美容 | 舰队 Collection | 心理 | Mac | 羽毛球技术 | 互联网公司 | 大学生兼职 | 烘焙 | 诸葛亮 | 跑跑卡丁车 | 武侠小说 | 微博 | 骨折 | 掌上游戏机 | 玉米 | 中国足球 | 电脑配置 | 洛奇英雄传 | 硬盘 | 张璐 | akb48 | 炉石传说 | 韩国 | 蓄电池 | QQ空间 | 房贷 | 麦克风 | 相声演员 | 抑郁 | 天下2（游戏） | 农业科学 | 神话 | 农历 | 中国足球协会超级联赛（CSL） | 流星花园 | 易烊千玺 | 火影忍者 | 日语歌曲 | 巴西 | 红酒 | 化疗 | 占地 | 网络小说 | 香烟 | 传奇世界 | 名字 | 日本电影 | 表演 | 西藏自治区 | 英雄传说：闪之轨迹（游戏） | 足球彩票 | 摩尔庄园 | 中国工商银行 | 游戏手柄 | 陈奕迅 | 联赛 | 天体物理学 | 英格兰足球超级联赛 | 超级机器人大战 | 命令与征服：红色警戒2（游戏） | 郭富城 | 一级方程式赛车（f1） | Adobe Photoshop | 英文歌曲 | 玄幻小说 | 猫和老鼠 | 杨凡 | 书籍改编电影 | 俄罗斯 | 网络赚钱 | 罗玉凤 | 刺客信条2 | 角色扮演 | 食物 | 药物 | 杨洋（演员） | 信息安全 | 胡歌（演员） | 张子枫 | 古典音乐 | 时尚 | 大片 | 电脑游戏 | 签证 | 徐佳莹 | 耽美 | 游戏攻略 | 音乐剧 | 前女友 | 男性 | 肠胃 | 刺客信条起源 | 剧场版 | 国际足联世界杯 | 彩虹六号（游戏） | 赵丽颖（演员） | 天体生物学 | 战神（游戏） | 吉他学习 | 飞机 | 三菱商事 | 关节炎 | 斗鱼直播 | 发电 | 张继科 | 华语流行音乐 | 搏击项目 | 主题曲 | 李信 | 刘德华（演员） | 即时战略游戏（RTS） | 欧阳娜娜 | 网址导航 | 海贼王 | 山地车 | 豆瓣电影 | 广场舞 |

你的位置：网站首页 >> 频道首页 >>外语学习 >>weakly supervisedd learning unweakly supervisedd learning，reinforcement learning有何

weakly supervisedd learning unweakly supervisedd learning，reinforcement learning有何

来源：蜘蛛抓取(WebSpider) 时间：2016-08-26 05:15 标签： superviseddescent

56186人阅读
Compression（15）
Computer Vision（101）
Machine Learning（47）
杂感（44）
无监督学习近年来很热，先后应用于computer vision, audio classification和 NLP等问题，通过机器进行无监督学习feature得到的结果，其accuracy大多明显优于其他方法进行training。本文将主要针对Andrew的unsupervised learning，结合他的：unsupervised feature learning by Andrew Ng做出导论性讲解。关键词：unsupervised learning，feature extraction，feature learning，Sparse Coding，Sparse DBN，Sparse Matrix，Computer Vision，Audio Classification，NLPUnsupervised feature learning and deep learning 是斯坦福大学机器学习大牛Andrew Y Ng. 近年来研究的主要领域，他在今年的一份工作中就通过unsupervised learning解决了从only unlabeled data上建立高维feature detectors的问题。=========================第一部分：传统方法Pattern Recognition=========================通常的，我们进行pattern recognition是这样的：对于不同类别的feature extraction都是必备的一部分，computer进行detection的 perception就是这样的：下面分别就这三类问题，&Object detection&&Audio Classification&&NLP&进行经典feature回顾：人类的视觉系统、听觉系统应该说是非常之complex，如果想要获得我们视觉系统看到的东西（computer perception），有两种方法：一种方法就是描述出我们的视觉系统在观察object的时候提取的那些特征（比如各种不同物体间的parts在2D、3D中的内容，是哪些特征让我们看出物体的区别，object parts之间的连接关系等）。另一种方法更为general，我们能否挖掘出一个general 的算法，它可以揭示大多数perception的形成（换言之，就是揭示一种人眼从看到识别出的算法）。不知道这里我讲明白没？没的话可以参考下下面两段：We can try to directly implement what the adult visual (or&audio) system is doing. (E.g., implement features that&capture different types of invariance, 2d and 3d context,&relations between object parts, …).&Or, if there is a more general computational&principal/algorithm that underlies most of perception, can&we instead try to discover and implement that? &对于下面的audio，和图像是一样的道理，我们能不能用一种算法学习出其feature，对一幅图像或者一段audio进行描述？对于图像，最直观的描述方法及就是用pixels，传统的方法为supervised learning, 给定一组正样本和一组负样本，通过提取feature训练进行学习，并进行识别测试：不同于有监督学习，Unsupervised learning通过训练一些列有label的和无label的数据集学习一幅图像中的feature（学习出什么样的feature是motocycle的，什么样的feature是car的）……那么，怎样学习有哪些feature呢？下面先介绍unsupervised learning中的一种方法——Sparse Coding，读者可以试着和前面我讲过的相结合来想想看。=================第二部分：Sparse Coding——A unsupervised learning Algorithm=================Sparse Coding 是 Unsupervised Learning Algorithm中的一种，可以用于Feature learning.下面是我对Sparse Coding的解释，做的笔记……用Sparse Coding的例子进行说明。比如在图像的Feature Extraction的最底层要做Edge Detector的生成，那么这里的工作就是从Natural Images中randomly选取一些小patch，通过这些patch生成能够描述他们的”基“，也就是右边的8*8=64个basis组成的basis（具体选取基的方法可以参考我的两篇文章——及），然后给定一个test patch, 我们可以按照上面的式子通过basis的线性组合得到，而sparse matrix就是a，下图中的a中有64个维度，其中非零项只有3个，故称”sparse“。这里可能大家会有疑问，为什么把底层作为Edge Detector呢？上层又是什么呢？这里做个简单解释大家就会明白，之所以是Edge Detector是因为不同方向的Edge就能够描述出整幅图像，所以不同方向的Edge自然就是图像的basis了……而上一层的basis组合的结果，上上层又是上一层的组合basis……（具体请往下看）如下图所示：其他的例子同理：注意看下面的文字（第二条）下图所示为从为标号的audio上学习出的20个基函数（如小波变换）：===================第三部分：Learning Features Hierachy & Sparse DBN===================所建立的自动feature学习过程是一个自底向上逐渐学习features的sparse coding过程：以Sparse DBN：Training on Faces为例，这里从下向上依次是上图的hierarchy的Input Image，Model V1(Edge Detector)，Model V2（Object Parts），Model V3（Object Models），具体讲解见下面我做的笔记：下面是对上图的解释，请对照着看：图中所示最下方的24个basis function用于Edge Detection, 比如最左上角的那个基用于检测85°的edge；中间的32个基（Object Parts）分别是 eye detector, nose detector……其之所以为基是因为，一张face可有这些parts组合而成；最上面一层的24个基就是face model了。==========================在不同object上做training是，所得的edge basis 是非常相似的，但object parts和models 就会completely different了：当训练数据由4类图像组成时，上层提取出的feature会不同，最终生成的object model也会包含4类图像特定的模型：下图是动作识别上，不同算法的准确率比较：Sparse DBN on Audio同理，对于一个Spectrogram，逐层提取feature过程如下图所示：===================第四部分：技术问题——Scaling Up===================进行Pattern Recognition的一个重大问题就是特征提取，而上面这幅图中我们可以看出不同算法在Features数目不同的情况下，其的Accuracy，可见feature越多，给出的参考信息越多，所得accuracy一般越好。那么，有哪些方法进行feature的挖掘使得scaling up呢？有兴趣的可以研究研究，互相交流下哈！===================第五部分：Learning Recursive Representations===================这部分我们主要以NLP为例，看一下怎么样递归的进行语义分析，自然语言组成：首先我们看下用多维向量（图中简化为2维）表示一个单词的形式：一句话：The cat sat on the mat. 进行自底向上的feature学习，可以发现，有的neuron上有意义，如图中箭头所指的那个neuron就不make sense.training process：Parsing a Sentence就这样recursively选择make sense的neuron成为该层新的神经元：我们在每层选取有意义的神经元后建立起最终的句型：好了，说完了NLP的parsing sentence问题，我们回头来看image processing（IP）, 其实，他们的道理相同，都是找到make sense的小patch再将其进行combine，就得到了上一层的feature，递归地向上learning feature。该图中，上面是NLP，下面是IP。===================小结===================最后我们对Unsupervised feature Learning做一个小结：o &Features 由机器学习，而非人为指定o &找到perception下隐藏的feature基o &Sparse coding 和 deep learning在CV和Audio Recogization上的识别率非常好，几乎是state of art的程度。Reference ：关于Machine Learning更多的学习资料将继续更新，敬请关注本博客和新浪微博。
参考知识库
* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
访问：5538983次
积分：45213
积分：45213
排名：第50名
原创：482篇
转载：36篇
评论：4334条
百度深度学习实验室RD，关注计算机视觉，机器学习，算法研究，人工智能，移动互联网等学科和产业，希望结识更多同道中人。
新浪微博：
(1)(1)(1)(4)(3)(1)(7)(7)(2)(1)(1)(9)(1)(1)(5)(2)(1)(3)(4)(6)(6)(5)(4)(1)(1)(1)(6)(1)(2)(7)(6)(7)(11)(20)(12)(23)(29)(37)(37)(6)(24)(6)(1)(1)(2)(2)(1)(7)(23)(20)(32)(17)(5)(22)(58)(8)(8)supervised learning,unsupervised learning ,regression - Stack Overflow
to customize your list.
Announcing Stack Overflow Documentation
We started with Q&A. Technical documentation is next, and we need your help.
Whether you're a beginner or an experienced developer, you can contribute.
I know that:
unsupervised learning is that of trying to find hidden structure in
unlabeled data,otherwise ,we call it supervised learning.
regression is also a type of classification ,except that its output
is infinite number of numeric numbers.
I also know that classification is a type of supervised learning.
But what make me confused is:
linear regression(line fitting) is a type of regression? if so , why its data is
unlabeled?For example, its sample data is just a quantity of
coordinates like (1,2),(2,3),(1,4)?
logistic regression(classification) is a type of regression ?if so ,why its output is
just norminal(value,true of false ,0 or 1)?
Anyone can help me figure out this?
1) Linear Regression is Supervised because the data you have include both the input and the output (so to say). So, for instance, if you have a dataset for, say, car sales at a dealership. You have, for each car, the make, model, price, color, discount etc. but you also have the number of sales for each car. If this task was truly supervised, you would have a dataset that included, maybe, just the make, model, price, color etc. (not the actual number of sales) and the best you could do is cluster the data. The example isn't perfect but aims to get across the big picture. A good question to ask yourself when deciding whether a method is supervised or not is to ask "Do I have a way of adjudging the quality of an input?". If you have Linear Regression data, you most certainly can. You just evaluate the value of the function (in this case, the line)
for the input data to estimate the output. Not so in the other case.
2) Logistic Regression isn't actually a regression. The name is misleading and does indeed lead to much confusion. It is usually only used for binary prediction which makes it perfect for classification tasks but nothing else.
Linear regression is supervised. You start with a dataset with a known dependent variable (label), train your model, then apply it later. You are trying to predict a real number, like the price of a house.
Logistic regression is also supervised. It's more of a classifier than a regression technique, despite it's name. You are trying to predict the odds ratio of class membership, like the odds of someone dying.
Examples of unsupervised learning include clustering and association analysis.
22.9k65386
Your Answer
Sign up or
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Post as a guest
By posting your answer, you agree to the
Not the answer you're looking for?
Browse other questions tagged
Stack Overflow works best with JavaScript enabledMachine Learning with scikit-learn
Machine learning algorithms
Supervised Learning
Supervised learning is concerned with learning a model from labeled data (training data) which has the correct answer. This allows us to make predictions about future or unseen data.
Regression and classification are the most common types of problems in supervised learning.
"An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a 'reasonable' way." - wiki - Supervised learning.
Unsupervised Learning
Unsupervised Learning's task is to construct an estimator which is able to predict the label of an object given the set of features.
Unsupervised Learning problem is "trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised learning and reinforcement learning." - wiki - Unsupervised learning.
In supervised learning, we know the right answer beforehand when we train
our model, and in reinforcement learning, we define a measure of reward for
particular actions by the agent.
In unsupervised learning, however, we are dealing
with unlabeled data or data of unknown structure. Using unsupervised learning
techniques, we are able to explore the structure of our data to extract meaningful
information without the guidance of a known outcome variable or reward function.
- Python Machine Learning by Sebastian Raschka
Simply put, the goal of unsupervised learning is to find structure in the unlabeled data. Clustering is probably the most common technique.
The third type of machine learning is reinforcement learning.
The goal of the reinforcement learning is to develop a system that improves its performance
based on interactions with the environment.
We could think of reinforcement learning as a supervised learning, however, in
reinforcement learning the feedback (reward) from the environment is not the label or value,
but a measure of how well the action was measured by the reward function.
Via the interaction with the environment, our system (agent) can then use reinforcement learning
to learn a series of actions that maximizes this reward via an exploratory
trial-and-error approach.
A popular example of reinforcement learning is a chess engine. Here, the agent decides
upon a series of moves depending on the state of the board (the environment), and the
reward can be defined as win or lose at the end of the game:
Credit: Python Machine Learning by Sebastian Raschka, 2015
Supervised - Classification with iris dataset
The following table is Iris dataset, which is a classic example in
the field of machine learning.
Credit: Python Machine Learning by Sebastian Raschka, 2015
Our Iris dataset contains the measurements of 150 iris flowers from three different species: Setosa, Versicolor, and Viriginica: it can then be written as a 150 x 3 matrix.
Here, each flower sample represents one row in our data set, and the flower measurements in
centimeters are stored as columns, which we also call the features of the dataset.
We are given the measurements of petals and sepals. The task is to guess the class of an individual flower. It's a classification task.
&&& from sklearn.datasets import load_iris
&&& iris = load_iris()
&&& X = iris.data
&&& y = iris.target
It is trivial to train a classifier once the data has this format. A support vector machine (SVM), for instance,
with a linear kernel:
&&& from sklearn.svm import LinearSVC
&&& LinearSVC()
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2',
random_state=None, tol=0.0001, verbose=0)
&&& clf = LinearSVC
clf is a statistical model that has hyperparameters that control the learning algorithm. Those hyperparameters can be supplied by the user in the constructor of the model.
By default the real model parameters are not initialized. The model parameters will be automatically tuned from the data by calling the fit() method:
&&& clf = clf.fit(X,y)
&&& clf.coef_
array([[ 0.,
0., -0., -0.],
[ 0., -0.,
[-0., -0.,
1.8653876 ]])
&&& clf.intercept_
array([ 0.,
Once the model is trained, it can be used to predict the most likely outcome on unseen data.
Let's try with iris.data, using the last sample:
&&& iris.data
array([[ 5.1,
&&& X_new = [[ 5.9,
&&& clf.predict(X_new)
array([2])
&&& iris.target_names
array(['setosa', 'versicolor', 'virginica'],
dtype='|S10')
The result is 2, and the id of the 3rd iris class, namely 'virginica'.
Supervised - Logistic regression models
scikit-learn logistic regression models can further predict probabilities of the outcome.
We continue to use the data from the previous section.
&&& from sklearn.linear_model import LogisticRegression
&&& clf2 = LogisticRegression().fit(X, y)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)
&&& clf2.predict_proba(X_new)
array([[ 0.,
0.2810578 ,
This means that the model estimates that the sample in X_new has:
0.1% likelyhood to belong to the 'setosa' class
28% likelyhood to belong to the 'versicolor' class
71% likelyhood to belong to the 'virginica' class
Actually, the model can predict using predict() method which is based on the probability output from predict_proba():
&&& clf2.predict(X_new)
array([2])
Note: the logistic regression is not a regression method but a classification method!
When do we use logistic regression?
In probabilistic setups - easy to incorporate prior knowledge
When the number of features is pretty small - The model will tell us which features are important.
When the training speed is an issue - training logistic regression is relatively fast.
When precision is not critical.
Unsupervised - Dimensionality Reduction
Here we want to derive a set of new artificial features that is smaller than the original feature set while retaining most of the variance of the original data. We call this dimensionality reduction.
Principal Component Analysis (PCA) is the most common technique for dimensionality reduction. PCA does it using linear combinations of the original features through a truncated Singular Value Decomposition of the matrix X so as to project the data onto a base of the top singular vectors.
&&& from sklearn.decomposition import PCA
&&& pca = PCA(n_components=2, whiten=True).fit(X)
After the fit(), the pca model exposes the singular vectors in the components_ attribute:
&&& ponents_
array([[ 0., -0.,
[-1., -1.,
&&& pca.explained_variance_ratio_
array([ 0.,
&&& pca.explained_variance_ratio_.sum()
Since the number of retained components is 2, we project the iris dataset along those first 2 dimensions:
&&& X_pca = pca.transform(X)
<ptthe normalized:
&&& import numpy as np
&&& np.round(X_pca.mean(axis=0), decimals=5)
array([-0.,
&&& np.round(X_pca.std(axis=0), decimals=5)
array([ 1.,
Also note that the samples components do no longer carry any linear correlation:
&&& import numpy as np
&&& np.round(np.corrcoef(X_pca.T), decimals=5)
array([[ 1., -0.],
Now, we can visualize the dataset using pylab, for instance by defining the utility function:
Here is the code:
from sklearn.datasets import load_iris
import pylab as pl
from itertools import cycle
from sklearn.decomposition import PCA
class pca_reduction:
def __init__(self):
iris = load_iris()
self.X = iris.data
self.y = iris.target
self.names = iris.target_names
self.plot()
def plot(self):
pca = PCA(n_components=2, whiten=True).fit(self.X)
X_pca = pca.transform(self.X)
plot_2D(X_pca, self.y, self.names)
def plot_2D(data, target, target_names):
colors = cycle('rgbcmykw')
target_ids = range(len(target_names))
pl.figure()
for i, c, label in zip(target_ids, colors, target_names):
pl.scatter(data[target == i, 0], data[target == i, 1],
c=c, label=label)
pl.legend()
if __name__ == '__main__':
pr = pca_reduction()
print 'X = %s' %pr.X
print 'y = %s' %pr.y
print 'names = %s' %pr.names
The code draws the following picture:
The projection was determined without any help from the labels (represented by the colors), which means this learning is unsupervised. Nevertheless, we see that the projection gives us insight into the distribution of the different flowers in parameter space: notably, iris setosa is much more distinct than the other two species as shown in the picture below:
Picture source - .
Machine Learning with scikit-learn
Machine learning algorithms
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization
Custom Search
Custom Search
Python tutorial
Please enable JavaScript to view the&&国之画&&&&&&
&& &&&&&&&&&&&&&&&&&&
鲁ICP备号-4
打开技术之扣，分享程序人生！12700人阅读
机器学习（38）
==========================================================================
上周生病再加上课余的一些琐事，这边的进度就慢下来了，本篇笔记基于斯坦福大学公开课cs229 的 lecture16，lecture 17
==========================================================================
零：一些认识
涉及到机器人的操控的时候，很多事情可能并不是supervised和unsupervised learning能够解决的，比如说andrew ng之前一直提到的自动控制直升飞机，另一个例子就是下棋，有可能很久之前的一步棋就埋下了后面失败的伏笔，而机器很难去判断一步棋的好坏。这就是增强学习需要解决的问题。
注:这里的Value价值即是很多书上写的Q值，貌似也有点差别，在于Q可能是Q(s,a)的，是给定状态和一个动作之后的V值，但差异不大。
一：马尔科夫决策过程（Markov decision processes）
马尔科夫决策是一个五元组，，用一个机器人走地图的例子来说明它们各自的作用
S：状态集：就是所有可能出现的状态，在机器人走地图的例子中就是所有机器人可能出现的位置
A：action，也就是所有可能的行动。机器人走地图的例子假设机器人只能朝四个方向走，那么A就是{N，S，E，W}表示四个方向
P：就是机器人在S状态时采取a行动的概率
γ：叫做discount factor，是一个0到1之间的数，这个数决定了动作先后对于结果的影响度，在棋盘上的例子来说就是影响了这一步
& 棋对于最结果的影响有多大可能说起来比较模糊，通过后面的说明可能会讲得比较清楚。
R：是一个reward function，也就是可能是一个，也可能是，对应来说就是地图上的权值
& & & & &===============================================================================
有了这样一个决策过程，那么机器人在地图上活动的过程也可以表现为如下的形式：
也就是从初始位置开始，选择一个action到达另一个状态，直到到达终状态，因此我们这样来定义这个过程的价值：
可以看出越早的决定对价值影响越大，其后则依次因为γ而衰减
其实可以看出，给出一个MDP之后，因为各个元都是定值，所以存在一个最优的策略(ploicy)，策略即是对于每个状态给出一个action，最优
策略就是在这样的策略下从任意一个初始状态能够以最大的价值到达终状态。策略用π表示。用
表示在策略π下以s为初始状态所能取得的价值，而通过Bellman equation，上式又等于：
注意这是一个递归的过程，在知道s的价值函数之前必去知道所有的s'的价值函数。(价值函数指的是Vπ())
而我们定义最优的策略为π*，最优的价值函数为V*，可以发现这两个东西互为因果，都能互相转化。
二.价值迭代和策略迭代(Value iteration & policy iteration)
& &///////////////价值迭代VI：////////////////////
& &这个过程其实比较简单，因为我们知道R的值，所以通过不断更新V，最后V就是converge到V*，再通过V*就可以得到最优策略π*，通
& &过V*就可以得到最优策略π*其实就是看所有action中哪个action最后的value值最大即可，此处是通过bellman equation，可以通过解bellman equation得到
& &所有的V的值，这里有一个动归的方法，注意马尔科夫决策过程中的P其实是指客观存在的概率，比如机器人转弯可能没法精确到一个方向，而不是指在s状态
& &机器人选择a操作 & 的概率，刚才没说清楚
& &在此说明，也就是说：
& &是一个客观的统计量。
& & /////////////策略迭代PI/////////////////////
& &这次就是通过每次最优化π来使π converge到π*，V到V*。但因为每次都要计算π的value值，所以这种算法并不常用
& &这两个算法的区别就是过程的区别，但我感觉本质上差别不大。(andrew说有不一样，至少看起来不一样……这个待查)
三.连续状态的MDP
之前我们的状态都是离散的，如果状态是连续的，下面将用一个例子来予以说明，这个例子就是inverted pendulum问题
也就是一个铁轨小车上有一个长杆，要用计算机来让它保持平衡(其实就是我们平时玩杆子，放在手上让它一直保持竖直状态)
这个问题需要的状态有：都是real的值
x(在铁轨上的位置)
theta(杆的角度)
x’(铁轨上的速度)
thata'(角速度)
& /////////////////离散化///////////////////////////
& 也就是把连续的值分成多个区间，这是很自然的一个想法，比如一个二维的连续区间可以分成如下的离散值：
但是这样做的效果并不好，因为用一个离散的去表示连续空间毕竟是有限的离散值。
离散值不好的另一个原因是因为curse of dimension(维度诅咒)，因为连续值离散值后会有多个离散值，这样如果维度很大就会造成有非常多状态
从而使需要更多计算，这是随着dimension以指数增长的
//////////////////////simulator方法///////////////////////////////
也就是说假设我们有一个simulator，输入一个状态s和一个操作a可以输出下一个状态，并且下一个状态是服从MDP中的概率Psa的分布，即：
这样我们就把状态变成连续的了，但是如何得到这样一个simulator呢？
①：根据客观事实
比如说上面的inverted pendulum问题，action就是作用在小车上的水平力，根据物理上的知识，完全可以解出这个加速度对状态的影响
也就是算出该力对于小车的水平加速度和杆的角加速度，再去一个比较小的时间间隔，就可以得到S(t+1)了
②：学习一个simulator
这个部分，首先你可以自己尝试控制小车，得到一系列的数据，假设力是线性的或者非线性的，将S(t+1)看作关于S(t)和a(t)的一个函数
得到这些数据之后，你可以通过一个supervised learning来得到这个函数，其实就是得到了simulator了。
比如我们假设这是一个线性的函数：
在inverted pendulum问题中，A就是一个4*4的矩阵，B就是一个4维向量，再加上一点噪音，就变成了：其中噪音服从
我们的任务就是要学习到A和B
(这里只是假设线性的，更具体的，如果我们假设是非线性的，比如说加一个feature是速度和角速度的乘积，或者平方，或者其他，上式还可以写作：)
这样就是非线性的了，我们的任务就是得到A和B，用一个supervised learning分别拟合每个参数就可以了
四.连续状态中得Value(Q)函数
这里介绍了一个fitted value(Q) iteration的算法
在之前我们的value iteration算法中，我们有：
这里使用了期望的定义而转化。fitted value(Q) iteration算法的主要思想就是用一个参数去逼近右边的这个式子
也就是说：令
其中是一些基于s的参数，我们需要去得到系数的值，先给出算法步骤再一步步解释吧：
算法步骤其实很简单，最主要的其实就是他的思想：
在对于action的那个循环里，我们尝试得到这个action所对应的，记作q(a)
这里的q(a)都是对应第i个样例的情况
然后i=1……m的那个循环是得到是最优的action对应的Value值，记作y(i)，然后用y(i)拿去做supervised learning，大概就是这样一个思路
至于reward函数就比较简单了，比如说在inverted pendulum问题中，杆子比较直立就是给高reward，这个可以很直观地从状态得到衡量奖励的方法
在有了之上的东西之后，我们就可以去算我们的policy了：
五.确定性的模型
上面讲的连续状态的算法其实是针对一个非确定性的模型，即一个动作可能到达多个状态，有P在影响到达哪个状态
如果在一个确定性模型中，其实是一个简化的问题，得到的样例简化了，计算也简化了
也就是说一个对于一个状态和一个动作，只能到达另一个状态，而不是多个，特例就不细讲了
参考知识库
* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
访问：671041次
积分：5243
积分：5243
排名：第3543名
原创：77篇
评论：651条
DarkScope，喜欢机器学习和一些ACM算法//学习ing//求交流，求指教！=新浪微博
阅读：132359
(1)(1)(1)(1)(2)(2)(4)(1)(1)(5)(2)(2)(5)(4)(5)(3)(39)

weakly supervisedd learning unweakly supervisedd learning，reinforcement learning有何

我要回帖

更多关于 superviseddescent 的文章

随机推荐