什么是Clopper-Pearson swot分析是什么意思

当前位置: &
求翻译:CI based on the Clopper-Pearson method是什么意思?
CI based on the Clopper-Pearson method
问题补充:
CI的基础上, Clopper - Pearson法
根据Clopper皮尔逊方法的CI
根据Clopper-Pearson方法的CI
克洛珀-皮尔森方法的基础的词
根据 Clopper-Pearson 的方法的 CI
我来回答:
参考资料:
* 验证码:
登录后回答可以获得积分奖励,并可以查看和管理所有的回答。 |
我要翻译和提问
请输入您需要翻译的文本!关注今日:17 | 主题:188421
微信扫一扫
扫一扫,下载丁香园 App
即送15丁当
【其他】Clopper-Pearson method
页码直达:
1、谁能帮我解释一下这个方法Clopper-Pearson method,具体内容是什么?2、fisher exact检验又是什么?具体是用来检验什么内容的?
不知道邀请谁?试试他们
微信扫一扫
广告宣传推广
政治敏感、违法虚假信息
恶意灌水、重复发帖
违规侵权、站友争执
附件异常、链接失效
关于丁香园 上传我的文档
 下载
 收藏
毕业于南大管理系,从事公司内部项目、资质及档案管理
 下载此文档
正在努力加载中...
were clopper
too careful被克洛珀皮尔森太小心
下载积分:1500
内容提示:were clopper
too careful被克洛珀皮尔森太小心
文档格式:PDF|
浏览次数:0|
上传日期: 10:20:01|
文档星级:
全文阅读已结束,如果下载本文需要使用
 1500 积分
下载此文档
该用户还上传了这些文档
were clopper
too careful被克洛珀皮尔森太
官方公共微信From Wikipedia, the free encyclopedia
In , a binomial proportion confidence interval is a
for a proportion in a . It uses the proportion estimated in a
and allows for . There are several formulas for a binomial confidence interval, but all of them rely on the assumption of a . In general, a binomial distribution applies when an experiment is repeated a fixed number of times, each trial of the experiment has two possible outcomes (labeled arbitrarily success and failure), the probability of success is the same for each trial, and the trials are .
A simple example of a binomial distribution is the set of various possible outcomes, and their probabilities, for the number of heads observed when a (not necessarily fair) coin is flipped ten times. The observed binomial proportion is the fraction of the flips which turn out to be heads. Given this observed proportion, the confidence interval for the true proportion innate in that coin is a range of possible proportions which may contain the true proportion. A 95% confidence interval for the proportion, for instance, will contain the true proportion 95% of the times that the procedure for constructing the confidence interval is employed. Note that this does not mean that a calculated 95% confidence interval will contain the true proportion with 95% probability. Instead, one should interpret it as follows: the process of drawing a random sample and calculating an accompanying 95% confidence interval will generate a confidence interval that contains the true proportion in 95% of all cases. If we were to take 100 different samples and compute a 95% confidence interval for each sample, then (approximately) 95 of the 100 confidence intervals will contain the true proportion. In practice, however, we select one random sample and generate one confidence interval, which may or may not contain the true proportion. Each observed interval may over- or underestimate the true proportion.
There are several ways to compute a confidence interval for a binomial proportion. The normal approximation interval is the simplest formula, and the one introduced in most basic statistics classes and textbooks. This formula, however, is based on an approximation that does not always work well. Several competing formulas are available that perform better, especially for situations with a small sample size and a proportion very close to zero or one. The choice of interval will depend on how important it is to use a simple and easy-to-explain interval versus the desire for better accuracy.
The most commonly used formula for a binomial confidence interval relies on approximating the distribution of error about a binomially-distributed observation,
{\displaystyle {\hat {p}}}
, with a . However, although this distribution is frequently confused with a , it should be noted that the error distribution itself is not binomial, and hence other methods (below) are preferred.
The approximation is usually justified by the . The formula is
{\displaystyle {\hat {p}}\pm z{\sqrt {{\frac {1}{n}}{\hat {p}}\left(1-{\hat {p}}\right)}}}
or, equivalently
{\displaystyle {\frac {1}{n}}\left[n_{S}\pm z{\sqrt {{\frac {1}{n}}n_{S}n_{F}}}\right]}
{\displaystyle {\hat {p}}=n_{S}/n}
is the proportion of successes in a
process with
{\displaystyle n}
trials yielding
{\displaystyle n_{S}}
successes and
{\displaystyle n_{F}=n-n_{S}}
failures, and
{\displaystyle z}
{\displaystyle 1-{\tfrac {1}{2}}\alpha }
corresponding to the target error rate
{\displaystyle \alpha }
. For example, for a 95% confidence level the error
{\displaystyle \alpha }
 = 0.05, so
{\displaystyle 1-{\tfrac {1}{2}}\alpha }
 = 0.975 and
{\displaystyle z}
 = 1.96.
applies poorly to this distribution with a sample size less than 30 or where the proportion is close to 0 or 1. The normal approximation fails totally when the sample proportion is exactly zero or exactly one. A frequently cited rule of thumb is that the normal approximation is a reasonable one as long as
{\displaystyle n_{S}}
 & 5 and
{\displaystyle n_{F}}
 & 5, however even this is unre see Brown et al. 2001.
An important theoretical derivation of this confidence interval involves the inversion of a hypothesis test. Under this formulation, the confidence interval represents those values of the population parameter that would have large p-values if they were tested as a hypothesized . The collection of values,
{\displaystyle \theta }
, for which the normal approximation is valid can be represented as
{\displaystyle \left\{\theta {\bigg |}y\leq {\frac {{\hat {p}}-\theta }{\sqrt {{\frac {1}{n}}{\hat {p}}\left(1-{\hat {p}}\right)}}}\leq z\right\}}
{\displaystyle y}
{\displaystyle {\tfrac {1}{2}}\alpha }
Since the test in the middle of the inequality is a , the normal approximation interval is sometimes called the
interval, but
first described it in his 1812 book Théorie analytique des probabilités (page 283).
The Wilson interval is an improvement (the actual
is closer to the nominal value) over the normal approximation interval and was first developed by
{\displaystyle {\frac {1}{1+{\frac {1}{n}}z^{2}}}\left[{\hat {p}}+{\frac {1}{2n}}z^{2}\pm z{\sqrt {{\frac {1}{n}}{\hat {p}}\left(1-{\hat {p}}\right)+{\frac {1}{4n^{2}}}z^{2}}}\right]}
or, equivalently
{\displaystyle {\frac {1}{n+z^{2}}}\left[n_{S}+{\frac {1}{2}}z^{2}\pm z{\sqrt {{\frac {1}{n}}n_{S}n_{F}+{\frac {1}{4}}z^{2}}}\right]}
This interval has good properties even for a small number of trials and/or an extreme probability.
These properties are obtained from its derivation from the binomial model. Consider a binomial population probability
{\displaystyle P}
, whose distribution may be approximated by the
with standard deviation
{\displaystyle \scriptstyle {\sqrt {{\frac {1}{n}}P\left(1-P\right)}}}
. However, the distribution of true values about an observation is not binomial. Rather, an observation
{\displaystyle {\hat {p}}}
will have an error interval with a lower bound equal to
{\displaystyle P}
{\displaystyle {\hat {p}}}
is at the equivalent normal interval upper bound (i.e. for the same
{\displaystyle \alpha }
{\displaystyle P}
, and vice versa.
The Wilson interval can also be derived from
with two categories. The resulting interval
{\displaystyle \left\{\theta {\bigg |}y\leq {\frac {{\hat {p}}-\theta }{\sqrt {{\frac {1}{n}}\theta \left({1-\theta }\right)}}}\leq z\right\}}
can then be solved for
{\displaystyle \theta }
to produce the Wilson interval. The test in the middle of the inequality is a , so the Wilson interval is sometimes called the Wilson score interval.
The center of the Wilson interval
{\displaystyle {\frac {{\hat {p}}+{\frac {1}{2n}}z^{2}}{1+{\frac {1}{n}}z^{2}}}}
can be shown to be a weighted average of
{\displaystyle {\hat {p}}={\tfrac {n_{S}}{n}}}
{\displaystyle {\tfrac {1}{2}}}
{\displaystyle {\hat {p}}}
receiving greater weight as the sample size increases. For the 95% interval, the Wilson interval is nearly identical to the normal approximation interval using
{\displaystyle {\tilde {p}}={\tfrac {n_{S}+2}{n+4}}}
instead of
{\displaystyle {\hat {p}}}
The Wilson interval may be modified by employing a , in order to align the minimum
(rather than the average) with the nominal value.
Just as the Wilson interval mirrors , the Wilson interval with continuity correction mirrors the equivalent .
The following formulae for the lower and upper bounds of the Wilson score interval with continuity correction
{\displaystyle (w^{-},w^{+})}
are derived from Newcombe (1998).
{\displaystyle {\begin{aligned}w^{-}&=\operatorname {max} \left\{0,{\frac {2n{\hat {p}}+z^{2}-\left[z{\sqrt {z^{2}-{\frac {1}{n}}+4n{\hat {p}}(1-{\hat {p}})+(4{\hat {p}}-2)}}+1\right]}{2(n+z^{2})}}\right\}\\w^{+}&=\operatorname {min} \left\{1,{\frac {2n{\hat {p}}+z^{2}+\left[z{\sqrt {z^{2}-{\frac {1}{n}}+4n{\hat {p}}(1-{\hat {p}})-(4{\hat {p}}-2)}}+1\right]}{2(n+z^{2})}}\right\}\end{aligned}}}
However, if p=0,
{\displaystyle w^{-}}
must be taken as 0; if p=1,
{\displaystyle w^{+}}
is then 1.
The Jeffreys interval has a Bayesian derivation, but it has good frequentist properties. In particular, it has coverage properties that are similar to the Wilson interval, but it is one of the few intervals with the advantage of being equal-tailed (e.g., for a 95% confidence interval, the probabilities of the interval lying above or below the true value are both close to 2.5%). In contrast, the Wilson interval has a systematic bias such that it is centred too close to p = 0.5.
The Jeffreys interval is the Bayesian
obtained when using the
for the binomial proportion p. The
with parameters (1/2, 1/2). After observing x successes in n trials, the
for p is a Beta distribution with parameters (x + 1/2, n – x + 1/2).
When x ≠0 and x ≠ n, the Jeffreys interval is taken to be the 100(1 – α)% equal-tailed posterior probability interval, i.e., the α / 2 and 1 – α / 2 quantiles of a Beta distribution with parameters (x + 1/2, n – x + 1/2). These quantiles need to be computed numerically, although this is reasonably simple with modern statistical software.
In order to avoid the coverage probability tending to zero when p → 0 or 1, when x = 0 the upper limit is calculated as before but the lower limit is set to 0, and when x = n the lower limit is calculated as before but the upper limit is set to 1.
The Clopper-Pearson interval is an early and very common method for calculating binomial confidence intervals. This is often called an 'exact' method, but that is because it is based on the cumulative probabilities of the binomial distribution (i.e., exactly the correct distribution rather than an approximation)[], but the intervals are not exact in the way that one might assume: the discontinuous nature of the binomial distribution precludes any interval with exact coverage for all population proportions. The Clopper-Pearson interval can be written as
{\displaystyle S_{\leq }\cap S_{\geq }}
or equivalently,
{\displaystyle (\inf S_{\geq }\,,\,\sup S_{\leq })}
{\displaystyle S_{\leq }:=\left\{\theta {\Big |}P\left[\mathrm {Bin} \left(n;\theta \right)\leq X\right]&{\frac {\alpha }{2}}\right\}\mathrm {~~and~~} S_{\geq }:=\left\{\theta {\Big |}P\left[\mathrm {Bin} \left(n;\theta \right)\geq X\right]&{\frac {\alpha }{2}}\right\},}
where 0 ≤ X ≤ n is the number of successes observed in the sample and Bin(n; θ) is a binomial random variable with n trials and probability of success θ.
Because of a relationship between the binomial distribution and the , the Clopper-Pearson interval is sometimes presented in an alternate format that uses quantiles from the beta distribution.
{\displaystyle B\left({\frac {\alpha }{2}};x,n-x+1\right)&\theta &B\left(1-{\frac {\alpha }{2}};x+1,n-x\right)}
where x is the number of successes, n is the number of trials, and B(p; v,w) is the pth
from a beta distribution with shape parameters v and w.
{\displaystyle x}
{\displaystyle 0}
{\displaystyle n}
, closed-form expressions for the interval bounds are available: when
{\displaystyle x=0}
the interval is
{\displaystyle (0,1-({\frac {\alpha }{2}})^{\frac {1}{n}})}
{\displaystyle x=n}
{\displaystyle (({\frac {\alpha }{2}})^{\frac {1}{n}},1)}
The beta distribution is, in turn, related to the
so a third formulation of the Clopper-Pearson interval can be written using F quantiles:
{\displaystyle \left(1+{\frac {n-x+1}{x\,F\!\left[{\frac {\alpha }{2}};2x,2(n-x+1)\right]}}\right)^{-1}&\theta &\left(1+{\frac {n-x}{(x+1)\,\,F\!\left[1-{\frac {\alpha }{2}};2(x+1),2(n-x)\right]}}\right)^{-1}}
where x is the number of successes, n is the number of trials, and F(c; d1, d2) is the c quantile from an F-distribution with d1 and d2 degrees of freedom.
The Clopper-Pearson interval is an exact interval since it is based directly on the binomial distribution rather than any approximation to the binomial distribution. This interval never has less than the nominal coverage for any population proportion, but that means that it is usually conservative. For example, the true coverage rate of a 95% Clopper-Pearson interval may be well above 95%, depending on n and θ. Thus the interval may be wider than it needs to be to achieve 95% confidence. In contrast, it is worth noting that other confidence bounds may be narrower than their nominal confidence width, i.e., the Normal Approximation (or "Standard") Interval, Wilson Interval, Agresti-Coull Interval, etc., with a nominal coverage of 95% may in fact cover less than 95%.
The Agresti-Coull interval is also another approximate binomial confidence interval.
{\displaystyle X}
successes in
{\displaystyle n}
trials, define
{\displaystyle {\tilde {n}}=n+z^{2}}
{\displaystyle {\tilde {p}}={\frac {1}{\tilde {n}}}\left(X+{\frac {1}{2}}z^{2}\right)}
Then, a confidence interval for
{\displaystyle p}
is given by
{\displaystyle {\tilde {p}}\pm z{\sqrt {{\frac {1}{\tilde {n}}}{\tilde {p}}\left(1-{\tilde {p}}\right)}}}
{\displaystyle z}
{\displaystyle 1-{\frac {1}{2}}\alpha }
quantile of a standard normal distribution, as before. For example, for a 95% confidence interval, let
{\displaystyle \alpha =0.05}
{\displaystyle z}
= 1.96 and
{\displaystyle z^{2}}
= 3.84. If we use 2 instead of 1.96 for
{\displaystyle z}
, this is the "add 2 successes and 2 failures" interval in
For more details on this topic, see .
Let X be the number of successes in n trials and let p = X/n. The variance of p is
{\displaystyle \operatorname {var} (p)={\frac {p(1-p)}{n}}}
Using the arc sine transform the variance of the arcsine of p is
{\displaystyle \operatorname {var} \left(\arcsin({\sqrt {p}})\right)\approx {\frac {\operatorname {var} (p)}{4p(1-p)}}={\frac {p(1-p)}{4np(1-p)}}={\frac {1}{4n}}}
So, confidence interval itself has the following form:
{\displaystyle \sin ^{2}\left(\arcsin({\sqrt {p}})-{\frac {z}{2{\sqrt {n}}}}\right)&\theta &\sin ^{2}\left(\arcsin({\sqrt {p}})+{\frac {z}{2{\sqrt {n}}}}\right)}
{\displaystyle z}
{\displaystyle \scriptstyle 1\,-\,{\frac {\alpha }{2}}}
quantile of a standard normal distribution
This method may be used to estimate the variance of p but its use is problematic when p is close to 0 or 1.
Let p be the proportion of successes. For 0 ≤ a ≤ 2
{\displaystyle t_{a}=\log \left({\frac {p^{a}}{(1-p)^{2-a}}}\right)=a\log(p)-(2-a)\log(1-p)}
This family is a generalisation of the logit transform which is a special case with a = 1 and can be used to transform a proportional data distribution to an approximately . The parameter a has to be estimated for the data set.
In medicine, the
is used to provide a simple way of stating an approximate 95% confidence interval for p, in the special case that no successes (
{\displaystyle {\hat {p}}=0}
) have been observed. The interval is (0,3/n).
By symmetry, one could expect for only successes (
{\displaystyle {\hat {p}}=1}
), the interval is (1-3/n,1).
There are several research papers that compare these and other confidence intervals for the binomial proportion. Both Agresti and Coull (1998) and Ross (2003) point out that exact methods such as the Clopper-Pearson interval may not work as well as certain approximations.
Many of these intervals can be calculated in
using packages like .
Sullivan, Lisa (2016). .
Wallis, Sean A. (2013).
(PDF). Journal of Quantitative Linguistics. 20 (3): 178–208. :.
; ; DasGupta, Anirban (2001). "Interval Estimation for a Binomial Proportion". Statistical Science. 16 (2): 101–133. :.  .  .
(1927). "Probable inference, the law of succession, and statistical inference". Journal of the American Statistical Association. 22: 209–212. :.  .
Newcombe, R. G. (1998). "Two-sided confidence intervals for the single proportion: comparison of seven methods". . 17 (8): 857–872. :.  .
Cai, TT (2005). "One-sided confidence intervals in discrete distributions". Journal of Statistical Planning and Inference. 131: 63–88. :.
Clopper, C.;
(1934). "The use of confidence or fiducial limits illustrated in the case of the binomial". Biometrika. 26: 404–413. :.
Thulin, M?ns (). . Electronic Journal of Statistics. 8 (1): 817–840. :.  .
Agresti, A Coull, Brent A. (1998). "Approximate is better than 'exact' for interval estimation of binomial proportions". The American Statistician. 52: 119–126. :.  .  .
Shao J (1998) Mathematical statistics. Springer. New York, New York, USA
Steve Simon (2010) , The Children's Mercy Hospital, Kansas City, Mo. (website: "Ask Professor Mean at
October 15, 2011, at the .)
Reiczigel, J (2003).
(PDF). Statistics in Medicine. 22: 611–621. :.
Sauro J., Lewis J.R. (2005) . Proceedings of the Human Factors and Ergonomics Society, 49th Annual Meeting (HFES 2005), Orlando, FL, p
Ross, T. D. (2003). "Accurate confidence intervals for binomial proportion and Poisson rate estimation". Computers in Biology and Medicine. 33: 509–531. :.
: Hidden categories:

我要回帖

更多关于 什么是回归分析 的文章

 

随机推荐