Confidence interval. What is it and how can you use it?


2019-08-15 09:00:28




Table of contents:

Confidence interval has come to us from the field of statistics. It is a certain range that is used to estimate an unknown parameter with a high degree of reliability. The easiest way to be explained by an example.

Suppose you want to explore any random variable, for example, the speed of the server response to the client request. Every time the user types the address of a particular website, the server responds with a different speed. Thus, the analyzed response time is random. So, the confidence interval allows you to define the boundaries of this parameter, and then you can say that with a probability of 95%, the reaction speed of the server will be in the calculated range.

Or do you need to know how many people know about the brand of the company. When will be the estimated confidence interval, then can be, for example, to say that with 95% probability the proportion of consumers who know about this brand, is in the range of 27% to 34%.

With this term closely linked such a quantity, as a confidence probability. It represents the probability that the value is included in the confidence interval. This value depends on how big will be our desired range. The larger the value it takes, the narrower the confidence interval, and Vice versa. Usually it is set equal to 90%, 95% or 99%. The value 95% of the most popular.

This index is also influenced by the variance of the observations and the sample size. Its definition is based on the assumption that the studied characteristic obeys the normal distribution law. This statement is also known as Gauss's Law. According to him, normal is a distribution of all of the probabilities of continuous random variables, which can be described by the probability density. If the assumption of normal distribution was incorrect, the estimate may be incorrect.


First, let's deal with how to compute a confidence interval for mathematical expectation. Here two cases are possible. Dispersion (degree of spread of a random variable) can be known or not. If you know it, our confidence interval is calculated using the following formula:

Khsr – t*σ / (sqrt(n)) <= α <= khsr + t*σ / (sqrt(n)), where

α – basis,

T – a table parameter of the Laplace distribution,

Sqrt(n) – the square root of the total sample,

σ – the square root of the variance.

If the variance is unknown, it can be calculated if we know all the values of the sought characteristic. To do this, use the following formula:

σ2 = х2ср – (khsr)2, where

х2ср – the average value of the squares of the studied symptom,

(khsr)2 – square of the average value of this flag.

A Formula that is calculated in this case the confidence interval is slightly different:

Khsr – t*s / (sqrt(n)) <= α <= khsr + t*s / (sqrt(n)), where

Khsr – the sample mean,

α – a sign,

T – option that find using the table of student's distribution t = t(ɣ;n-1)

Sqrt(n) – the square root of the total sample,

S – square root of the variance.

Consider an example. Suppose that the results of 7 measurements was determined as the average value of the investigated trait, is 30 and the sample variance equal to 36. Need to find a 99% confidence interval that contains the true value of the measured parameter.

First, let's define what is the value of t : t = t (0,99; 7-1) = 3.71. Use the above formula, we get:

Khsr – t*s / (sqrt(n)) <= α <= khsr + t*s / (sqrt(n))

30 – 3.71*36 / (sqrt(7)) <= α <= 30 + 3.71*36 / (sqrt(7))

21.587 <= α <= 38.413

Confidence interval for the variance is calculated as in the case of a known medium, and then, when there is no data on the mathematical expectation, and we only know the point value of an unbiased estimator of the variance. We will not give here the formulae of its calculation, as they are quite complex and if you want you can always find them on the network.

Note that the confidence interval it is convenient to define using the Excel program or online service which is called.

