如何做 highblk availabilityy manager

相关文章推荐
NameNode是HDFS集群的单点故障,每一个集群只有一个NameNode,如果这个机器或进程不可用,整个集群就无法使用,直到重启NameNode或者新启动一个NameNode节点
影响HDFS集...
在hadoop2.0之前,namenode只有一个,存在单点问题(虽然hadoop1.0有secondarynamenode,checkpointnode,buckcupnode这些...
NameNode 的主备切换实现
NameNode 主备切换主要由 ZKFailoverController、HealthMonitor 和 ActiveStandbyElector 这 3 个组...
HA解决了HDFS的NameNode的单点问题;
Federation解决了整个HDFS集群中只有一个名字空间,并且只有单独的一个NameNode管理所有DataNode的问题。
一、HA机...
前言任何系统即使做的再大,都会有可能出现各种各样的突发状况。尽管你可以说我在软件层面上已经做到所有情况的意外处理了,但是万一硬件出问题了或者说物理层面上出了问题,恐怕就不是多写几行代码能够立刻解决的吧...
性能测试建议
在一个节点的一个分片,不设置副本,测试性能
在完全默认设置上记录性能数据,作为测试的基准线
确保性能测试持续30分钟以上以确认长时间的性能;短时间的测试可能不会碰到segment合并和G...
1.下载Flink压缩包
下载地址:http://flink.apache.org/downloads.html。
我集群环境是hadoop2.6,Scala2.11版本的,所以下载的是:
本节主要介绍了HDFS HA(High Availability)的原理、主备切换过程以及基于JournalNode的共享存储系统。
1. 前言在当初介绍Hadoop2.0时,我们简单提到了Hadoo...
原文地址:http://blog.csdn.net/hilyoo/article/details/7704280
1、CAP理论
1) CAP 理论给出了3个基本要素:
...
Spark:Master High Availability(HA)高可用配置的2种实现
Spark Standalone集群是Master-Slaves架构的集群模式,和大部分的Maste...
他的最新文章
他的热门文章
您举报文章:
举报原因:
原文地址:
原因补充:
(最多只允许输入30个字)相关文章推荐
本指南概述HDFS的高可用性(HA)的特性,以及如何配置和管理HA HDFS集群,使用QJM特性。
本文假设读者有一个大致了解通用组件和一个HDFS集群中的节点类型。详情请参阅HDFS架构指南。...
http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.htm...
转帖请注明来自本空间地址:
http://blog.csdn.net/chenpingbupt
https://issues.apa...
Spark:Master High Availability(HA)高可用配置的2种实现
Spark Standalone集群是Master-Slaves架构的集群模式,和大部分的Maste...
如果Spark的部署方式选择Standalone,一个采用Master/Slaves的典型架构,那么Master是有SPOF(单点故障,Single Point of Failure)。Spark可以...
什么是高可用性?
高可用集群是指以减少服务中断时间为目的的服务器集群技术。
高可用性HA(HighAvailability)指的是通过尽量缩短因日常维护操作...
由于时间关系,原规划在Hadoop cluster2上实施HA+NFS+zookeeper的,改成了在Hadoop cluster1上实施,这样省略了ssh无密码登陆配置环节和hadoop集群配置环节...
原文地址:http://blog.csdn.net/hilyoo/article/details/7704280
1、CAP理论
1) CAP 理论给出了3个基本要素:
...
本节主要介绍了HDFS HA(High Availability)的原理、主备切换过程以及基于JournalNode的共享存储系统。
1. 前言在当初介绍Hadoop2.0时,我们简单提到了Hadoo...
Hadoop的设计初衷是服务于off-line的数据存储和处理应用。随着这个产品的不断成熟和发展,对于支持on-line应用的需求越来越强烈。例如HBase已经被Facebook和淘宝用到了在线存储应...
他的最新文章
他的热门文章
您举报文章:
举报原因:
原文地址:
原因补充:
(最多只允许输入30个字)Access denied | www.vladan.fr used Cloudflare to restrict access
Please enable cookies.
What happened?
The owner of this website (www.vladan.fr) has banned your access based on your browser's signature (3d3a819a08866d1e-ua98).相关文章推荐
http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.htm...
一、背景HDFS集群中只有一个Namenode,这就会引入单点问题;即如果Namenode故障,那么这个集群将不可用,直到Namenode重启或者其他Namenode接入。
有两种方式会影响集群的...
本指南提供一个HDFS HA特性的综述,描述了如何使用QJM配置和管理一个HA HDFS集群。
本文档假设读者对于HDFS中的通用组件和节点类型有一个大体的认识。请参考HDFS架构指南获取更...
http://blog.csdn.net/stark_summer/article/details/
1.HDFS HA与QJM解决了什么问题?
2.HDFS ...
本指南概述HDFS的高可用性(HA)的特性,以及如何配置和管理HA HDFS集群,使用NFS实现NameNode共享存储
本文假设读者有一个大致了解通用组件和一个HDFS集群中的节点类型。详情请...
转载 /?p=205
HDFS HA的解决方案可谓百花齐放,Linux HA, VMware FT, shared NAS+NFS, B...
/articles/eIBB3a
自从hadoop2版本开始,社区引入了NameNode高可用方案。Na...
在了解了HA的特性和架构后,接下来学习如何配置管理HA,在学习与配置HA有关的参数之前,先明确部署HA所需要的主机数量。由HA的架构可知,存在两个NameNode主机,一个为现役NameNode主机,...
在学习了如何配置HA后,接下来是启动和管理HA。要启动HA集群,首先要在所有运行JournalNode的主机上启动JournalNodes守护进程,可以在每台主机上执行命令hdfs journalno...
High Availibility 高可用要实现Hadoop的高可用,有两种方式:
Quorum Journal Manager
Network File SystemUsing ...
他的最新文章
他的热门文章
您举报文章:
举报原因:
原文地址:
原因补充:
(最多只允许输入30个字)Apache Hadoop 2.9.0 – ResourceManager High Availability
ResourceManager High Availability
&| Last Published:
&| Version: 2.9.0
ResourceManager High Availability
Introduction
This guide provides an overview of High Availability of YARN’s ResourceManager, and details how to configure and use this feature. The ResourceManager (RM) is responsible for tracking the resources in a cluster, and scheduling applications (e.g., MapReduce jobs). Prior to Hadoop 2.4, the ResourceManager is the single point of failure in a YARN cluster. The High Availability feature adds redundancy in the form of an Active/Standby ResourceManager pair to remove this otherwise single point of failure.
Architecture
RM Failover
ResourceManager HA is realized through an Active/Standby architecture - at any point of time, one of the RMs is Active, and one or more RMs are in Standby mode waiting to take over should anything happen to the Active. The trigger to transition-to-active comes from either the admin (through CLI) or through the integrated failover-controller when automatic-failover is enabled.
Manual transitions and failover
When automatic failover is not enabled, admins have to manually transition one of the RMs to Active. To failover from one RM to the other, they are expected to first transition the Active-RM to Standby and transition a Standby-RM to Active. All this can be done using the “yarn rmadmin” CLI.
Automatic failover
The RMs have an option to embed the Zookeeper-based ActiveStandbyElector to decide which RM should be the Active. When the Active goes down or becomes unresponsive, another RM is automatically elected to be the Active which then takes over. Note that, there is no need to run a separate ZKFC daemon as is the case for HDFS because ActiveStandbyElector embedded in RMs acts as a failure detector and a leader elector instead of a separate ZKFC deamon.
Client, ApplicationMaster and NodeManager on RM failover
When there are multiple RMs, the configuration (yarn-site.xml) used by clients and nodes is expected to list all the RMs. Clients, ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs in a round-robin fashion until they hit the Active RM. If the Active goes down, they resume the round-robin polling until they hit the “new” Active. This default retry logic is implemented as org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider. You can override the logic by implementing org.apache.hadoop.yarn.client.RMFailoverProxyProvider and setting the value of yarn.client.failover-proxy-provider to the class name.
Recovering previous active-RM’s state
enabled, the RM being promoted to an active state loads the RM internal state and continues to operate from where the previous active left off as much as possible depending on the RM restart feature. A new attempt is spawned for each managed application previously submitted to the RM. Applications can checkpoint periodically to avoid losing any work. The state-store must be visible from the both of Active/Standby RMs. Currently, there are two RMStateStore implementations for persistence - FileSystemRMStateStore and ZKRMStateStore. The ZKRMStateStore implicitly allows write access to a single RM at any point in time, and hence is the recommended store to use in an HA cluster. When using the ZKRMStateStore, there is no need for a separate fencing mechanism to address a potential split-brain situation where multiple RMs can potentially assume the Active role. When using the ZKRMStateStore, it is advisable to NOT set the “zookeeper.DigestAuthenticationProvider.superDigest” property on the Zookeeper cluster to ensure that the zookeeper admin does not have access to YARN application/user credential information.
Deployment
Configurations
Most of the failover functionality is tunable using various configuration properties. Following is a list of required/important ones. yarn-default.xml carries a full-list of knobs. See
for more information including default values. See the document for
also for instructions on setting up the state-store.
Configuration Properties
Description
yarn.resourcemanager.zk-address
Address of the ZK-quorum. Used both for the state-store and embedded leader-election.
yarn.resourcemanager.ha.enabled
Enable RM HA.
yarn.resourcemanager.ha.rm-ids
List of logical IDs for the RMs. e.g., “rm1,rm2”.
yarn.resourcemanager.hostname.rm-id
For each rm-id, specify the hostname the RM corresponds to. Alternately, one could set each of the RM’s service addresses.
yarn.resourcemanager.address.rm-id
For each rm-id, specify host:port for clients to submit jobs. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.scheduler.address.rm-id
For each rm-id, specify scheduler host:port for ApplicationMasters to obtain resources. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.resource-tracker.address.rm-id
For each rm-id, specify host:port for NodeManagers to connect. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.admin.address.rm-id
For each rm-id, specify host:port for administrative commands. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.webapp.address.rm-id
For each rm-id, specify host:port of the RM web application corresponds to. You do not need this if you set yarn.http.policy to HTTPS_ONLY. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.webapp.https.address.rm-id
For each rm-id, specify host:port of the RM https web application corresponds to. You do not need this if you set yarn.http.policy to HTTP_ONLY. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.ha.id
Identifies the RM in the ensemble. T however, if set, admins have to ensure that all the RMs have their own IDs in the config.
yarn.resourcemanager.ha.automatic-failover.enabled
Enabl By default, it is enabled only when HA is enabled.
yarn.resourcemanager.ha.automatic-failover.embedded
Use embedded leader-elector to pick the Active RM, when automatic failover is enabled. By default, it is enabled only when HA is enabled.
yarn.resourcemanager.cluster-id
Identifies the cluster. Used by the elector to ensure an RM doesn’t take over as Active for another cluster.
yarn.client.failover-proxy-provider
The class to be used by Clients, AMs and NMs to failover to the Active RM.
yarn.client.failover-max-attempts
The max number of times FailoverProxyProvider should attempt failover.
yarn.client.failover-sleep-base-ms
The sleep base (in milliseconds) to be used for calculating the exponential delay between failovers.
yarn.client.failover-sleep-max-ms
The maximum sleep time (in milliseconds) between failovers.
yarn.client.failover-retries
The number of retries per attempt to connect to a ResourceManager.
yarn.client.failover-retries-on-socket-timeouts
The number of retries per attempt to connect to a ResourceManager on socket timeouts.
Sample configurations
Here is the sample of minimal setup for RM failover.
&property&
&name&yarn.resourcemanager.ha.enabled&/name&
&value&true&/value&
&/property&
&property&
&name&yarn.resourcemanager.cluster-id&/name&
&value&cluster1&/value&
&/property&
&property&
&name&yarn.resourcemanager.ha.rm-ids&/name&
&value&rm1,rm2&/value&
&/property&
&property&
&name&yarn.resourcemanager.hostname.rm1&/name&
&value&master1&/value&
&/property&
&property&
&name&yarn.resourcemanager.hostname.rm2&/name&
&value&master2&/value&
&/property&
&property&
&name&yarn.resourcemanager.webapp.address.rm1&/name&
&value&master1:8088&/value&
&/property&
&property&
&name&yarn.resourcemanager.webapp.address.rm2&/name&
&value&master2:8088&/value&
&/property&
&property&
&name&yarn.resourcemanager.zk-address&/name&
&value&zk1:2181,zk2:2181,zk3:2181&/value&
&/property&
Admin commands
yarn rmadmin has a few HA-specific command options to check the health/state of an RM, and transition to Active/Standby. Commands for HA take service id of RM set by yarn.resourcemanager.ha.rm-ids as argument.
$ yarn rmadmin -getServiceState rm1
$ yarn rmadmin -getServiceState rm2
If automatic failover is enabled, you can not use manual transition command. Though you can override this by –forcemanual flag, you need caution.
$ yarn rmadmin -transitionToStandby rm1
Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@1d8299fd
Refusing to manually manage HA state, since it may cause
a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please
specify the forcemanual flag.
for more details.
ResourceManager Web UI services
Assuming a standby RM is up and running, the Standby automatically redirects all web requests to the Active, except for the “About” page.
Web Services
Assuming a standby RM is up and running, RM web-services described at
when invoked on a standby RM are automatically redirected to the Active RM.

我要回帖

更多关于 general availability 的文章

 

随机推荐