How is standard deviation measured? Dispersion, root mean square (standard) deviation, coefficient of variation


  • Answers to exam questions on public health and healthcare.
  • 1. Public health and healthcare as a science and area of ​​practical activity. Main goals. Object, subject of study. Methods.
  • 2. Healthcare. Definition. History of healthcare development. Modern healthcare systems, their characteristics.
  • 3. State policy in the field of protecting public health (Law of the Republic of Belarus “On Health Care”). Organizational principles of the public health care system.
  • 4. Insurance and private forms of healthcare.
  • 5. Prevention, definition, principles, modern problems. Types, levels, directions of prevention.
  • 6. National prevention programs. Their role in improving public health.
  • 7. Medical ethics and deontology. Definition of the concept. Modern problems of medical ethics and deontology, characteristics.
  • 8. Healthy lifestyle, definition of the concept. Social and medical aspects of a healthy lifestyle (healthy lifestyle).
  • 9. Hygienic training and education, definition, basic principles. Methods and means of hygienic training and education. Requirements for the lecture, sanitary bulletin.
  • 10. Population health, factors influencing public health. Health formula. Indicators characterizing public health. Analysis scheme.
  • 11. Demography as a science, definition, content. The importance of demographic data for health care.
  • 12. Population statistics, study methods. Population censuses. Types of age structures of the population.
  • 13. Mechanical movement of the population. Characteristics of migration processes, their impact on population health indicators.
  • 14. Fertility as a medical and social problem. Methodology for calculating indicators. Fertility levels according to WHO data. Modern tendencies.
  • 15. Special fertility indicators (fertility indicators). Population reproduction, types of reproduction. Indicators, calculation methods.
  • 16. Mortality as a medical and social problem. Study methodology, indicators. Overall mortality levels according to WHO data. Modern tendencies.
  • 17. Infant mortality as a medical and social problem. Factors determining its level.
  • 18. Maternal and perinatal mortality, main causes. Indicators, calculation methods.
  • 19. Natural movement of the population, factors influencing it. Indicators, calculation methods. Basic patterns of natural movement in Belarus.
  • 20. Family planning. Definition. Modern problems. Medical organizations and family planning services in the Republic of Belarus.
  • 21. Morbidity as a medical and social problem. Modern trends and features in the Republic of Belarus.
  • 22. Medical and social aspects of the neuropsychic health of the population. Organization of psychoneurological care
  • 23. Alcoholism and drug addiction as a medical and social problem
  • 24. Diseases of the circulatory system as a medical and social problem. Risk factors. Directions of prevention. Organization of cardiac care.
  • 25. Malignant neoplasms as a medical and social problem. Main directions of prevention. Organization of oncological care.
  • 26. International statistical classification of diseases. Principles of construction, procedure for use. Its significance in the study of morbidity and mortality of the population.
  • 27. Methods for studying population morbidity, their comparative characteristics.
  • Methodology for studying general and primary morbidity
  • Indicators of general and primary morbidity.
  • Indicators of infectious morbidity.
  • Main indicators characterizing the most important non-epidemic morbidity.
  • Main indicators of “hospitalized” morbidity:
  • 4) Diseases with temporary disability (question 30)
  • Main indicators for the analysis of morbidity with VUT.
  • 31. Study of morbidity according to preventive examinations of the population, types of preventive examinations, procedure. Health groups. The concept of “pathological affection”.
  • 32. Morbidity according to data on causes of death. Study methodology, indicators. Medical death certificate.
  • Main morbidity indicators based on causes of death:
  • 33. Disability as a medical and social problem Definition of the concept, indicators. Disability trends in the Republic of Belarus.
  • Disability trends in the Republic of Belarus.
  • 34. Primary health care (PHC), definition, content, role and place in the health care system for the population. Main functions.
  • 35. Basic principles of primary health care. Medical organizations of primary health care.
  • 36. Organization of medical care provided to the population on an outpatient basis. Basic principles. Institutions.
  • 37. Organization of medical care in a hospital setting. Institutions. Indicators of provision of inpatient care.
  • 38. Types of medical care. Organization of specialized medical care for the population. Centers for specialized medical care, their tasks.
  • 39. Main directions for improving inpatient and specialized care in the Republic of Belarus.
  • 40. Protecting the health of women and children in the Republic of Belarus. Control. Medical organizations.
  • 41. Modern problems of women's health. Organization of obstetric and gynecological care in the Republic of Belarus.
  • 42. Organization of medical and preventive care for children. Leading problems in children's health.
  • 43. Organization of health care for the rural population, basic principles of providing medical care to rural residents. Stages. Organizations.
  • Stage II – territorial medical association (TMO).
  • Stage III – regional hospital and regional medical institutions.
  • 45. Medical and social examination (MSE), definition, content, basic concepts.
  • 46. ​​Rehabilitation, definition, types. Law of the Republic of Belarus “On the Prevention of Disability and Rehabilitation of Disabled Persons”.
  • 47. Medical rehabilitation: definition of the concept, stages, principles. Medical rehabilitation service in the Republic of Belarus.
  • 48. City clinic, structure, tasks, management. Key performance indicators of the clinic.
  • Key performance indicators of the clinic.
  • 49. The local principle of organizing outpatient care for the population. Types of plots. Territorial therapeutic area. Standards. Contents of the work of a local physician-therapist.
  • Organization of the work of a local therapist.
  • 50. Office of infectious diseases of the clinic. Sections and methods of work of a doctor in the office of infectious diseases.
  • 52. Main indicators characterizing the quality and effectiveness of dispensary observation. Method of their calculation.
  • 53. Department of medical rehabilitation (MR) of the clinic. Structure, tasks. The procedure for referring patients to the OMR.
  • 54. Children's clinic, structure, tasks, sections of work. Features of providing medical care to children in outpatient settings.
  • 55. The main sections of the work of a local pediatrician. Contents of treatment and preventive work. Communication in work with other treatment and preventive institutions. Documentation.
  • 56. Contents of preventive work of a local pediatrician. Organization of nursing care for newborns.
  • 57. Structure, organization, content of the work of the antenatal clinic. Indicators of work on servicing pregnant women. Documentation.
  • 58. Maternity hospital, structure, organization of work, management. Performance indicators of the maternity hospital. Documentation.
  • 59. City hospital, its tasks, structure, main performance indicators. Documentation.
  • 60. Organization of the work of the hospital reception department. Documentation. Measures to prevent nosocomial infections. Therapeutic and protective regime.
  • Section 1. Information about the divisions and installations of the treatment and preventive organization.
  • Section 2. Staff of the treatment and prevention organization at the end of the reporting year.
  • Section 3. Work of doctors of the clinic (outpatient clinic), dispensary, consultations.
  • Section 4. Preventive medical examinations and work of dental (dental) and surgical offices of a medical and preventive organization.
  • Section 5. Work of medical and auxiliary departments (offices).
  • Section 6. Operation of diagnostic departments.
  • 62. Annual report on the activities of the hospital (form 14), procedure for preparation, structure. Key performance indicators of the hospital.
  • Section 1. Composition of patients in the hospital and outcomes of their treatment
  • Section 2. Composition of sick newborns transferred to other hospitals at the age of 0-6 days and the outcomes of their treatment
  • Section 3. Bed capacity and its use
  • Section 4. Surgical work of the hospital
  • 63. Report on medical care for pregnant women, women in labor and postpartum women (f. 32), structure. Basic indicators.
  • Section I. Activities of the antenatal clinic.
  • Section II. Obstetrics in a hospital
  • Section III. Maternal mortality
  • Section IV. Information about births
  • 64. Medical genetic counseling, main institutions. Its role in the prevention of perinatal and infant mortality.
  • 65. Medical statistics, its sections, tasks. The role of the statistical method in the study of population health and the performance of the health care system.
  • 66. Statistical population. Definition, types, properties. Features of conducting statistical research on a sample population.
  • 67. Sample population, requirements for it. The principle and methods of forming a sample population.
  • 68. Unit of observation. Definition, characteristics of accounting characteristics.
  • 69. Organization of statistical research. Characteristics of the stages.
  • 70. Contents of the plan and program of statistical research. Types of statistical research plans. Observation program.
  • 71. Statistical observation. Continuous and non-continuous statistical research. Types of incomplete statistical research.
  • 72. Statistical observation (collection of materials). Errors in statistical observation.
  • 73. Statistical grouping and summary. Typological and variational grouping.
  • 74. Statistical tables, types, construction requirements.

81. Average standard deviation, calculation method, application.

An approximate method for assessing the variability of a variation series is to determine the limit and amplitude, but the values ​​of the variant within the series are not taken into account. The main generally accepted measure of the variability of a quantitative characteristic within a variation series is standard deviation (σ - sigma). The larger the standard deviation, the higher the degree of fluctuation of this series.

The method for calculating the standard deviation includes the following steps:

1. Find the arithmetic mean (M).

2. Determine the deviations of individual options from the arithmetic mean (d=V-M). In medical statistics, deviations from the average are designated as d (deviate). The sum of all deviations is zero.

3. Square each deviation d 2.

4. Multiply the squares of the deviations by the corresponding frequencies d 2 *p.

5. Find the sum of the products (d 2 *p)

6. Calculate the standard deviation using the formula:

when n is greater than 30, or
when n is less than or equal to 30, where n is the number of all options.

Standard deviation value:

1. The standard deviation characterizes the spread of the variant relative to the average value (i.e., the variability of the variation series). The larger the sigma, the higher the degree of diversity of this series.

2. The standard deviation is used for a comparative assessment of the degree of correspondence of the arithmetic mean to the variation series for which it was calculated.

Variations of mass phenomena obey the law normal distribution. The curve representing this distribution looks like a smooth bell-shaped symmetrical curve (Gaussian curve). According to the theory of probability, in phenomena that obey the law of normal distribution, there is a strict mathematical relationship between the values ​​of the arithmetic mean and the standard deviation. The theoretical distribution of a variant in a homogeneous variation series obeys the three-sigma rule.

If in a system of rectangular coordinates the values ​​of a quantitative characteristic (variants) are plotted on the abscissa axis, and the frequency of occurrence of a variant in a variation series is plotted on the ordinate axis, then variants with larger and smaller values ​​are evenly located on the sides of the arithmetic mean.

It has been established that with a normal distribution of the trait:

68.3% of the values ​​of the option are within M1

95.5% of the values ​​of the option are within M2

99.7% of the values ​​of the option are within M3

3. The standard deviation allows you to establish normal values ​​for clinical and biological parameters. In medicine, the interval M1 is usually taken as the normal range for the phenomenon being studied. The deviation of the estimated value from the arithmetic mean by more than 1 indicates a deviation of the studied parameter from the norm.

4. In medicine, the three-sigma rule is used in pediatrics for individual assessment of the level of physical development of children (sigma deviation method), for the development of standards for children's clothing

5. The standard deviation is necessary to characterize the degree of diversity of the characteristic being studied and to calculate the error of the arithmetic mean.

The value of the standard deviation is usually used to compare the variability of series of the same type. If two series with different characteristics are compared (height and weight, average duration of hospital treatment and hospital mortality, etc.), then a direct comparison of sigma sizes is impossible , because standard deviation is a named value expressed in absolute numbers. In these cases, use the coefficient of variation (Cv) , which is a relative value: the percentage of the standard deviation to the arithmetic mean.

The coefficient of variation is calculated using the formula:

The higher the coefficient of variation , the greater the variability of this series. It is believed that a coefficient of variation of more than 30% indicates the qualitative heterogeneity of the population.

Standard deviation is a classic indicator of variability from descriptive statistics.

Standard deviation, standard deviation, standard deviation, sample standard deviation(eng. standard deviation, STD, STDev) is a very common dispersion indicator in descriptive statistics. But, because technical analysis is akin to statistics, this indicator can (and should) be used in technical analysis to detect the degree of dispersion of the price of the analyzed instrument over time. Denoted by the Greek symbol Sigma "σ".

Thanks to Carl Gauss and Pearson for allowing us to use standard deviation.

Using standard deviation in technical analysis, we turn this "dispersion index"" V "volatility indicator“, maintaining the meaning, but changing the terms.

What is standard deviation

But besides the intermediate auxiliary calculations, standard deviation is quite acceptable for independent calculation and applications in technical analysis. As an active reader of our magazine burdock noted, “ I still don’t understand why the standard deviation is not included in the set of standard indicators of domestic dealing centers«.

Really, standard deviation can measure the variability of an instrument in a classic and “pure” way. But unfortunately, this indicator is not so common in securities analysis.

Applying standard deviation

Manually calculating the standard deviation is not very interesting, but useful for experience. Standard deviation can be expressed formula STD=√[(∑(x-x ) 2)/n] , which sounds like the root of the sum of squared differences between the elements of the sample and the mean, divided by the number of elements in the sample.

If the number of elements in the sample exceeds 30, then the denominator of the fraction under the root takes the value n-1. Otherwise n is used.

Step by step standard deviation calculation:

  1. calculate the arithmetic mean of the data sample
  2. subtract this average from each sample element
  3. we square all the resulting differences
  4. sum up all the resulting squares
  5. divide the resulting amount by the number of elements in the sample (or by n-1, if n>30)
  6. calculate Square root from the resulting quotient (called dispersion)

In this article I will talk about how to find standard deviation. This material is extremely important for a full understanding of mathematics, so a math tutor should devote a separate lesson or even several to studying it. In this article you will find a link to a detailed and understandable video tutorial that explains what standard deviation is and how to find it.

Standard deviation makes it possible to evaluate the spread of values ​​obtained as a result of measuring a certain parameter. Indicated by the symbol (Greek letter "sigma").

The formula for calculation is quite simple. To find the standard deviation, you need to take the square root of the variance. So now you have to ask, “What is variance?”

What is variance

The definition of variance goes like this. Dispersion is the arithmetic mean of the squared deviations of values ​​from the mean.

To find the variance, perform the following calculations sequentially:

  • Determine the average (simple arithmetic average of a series of values).
  • Then subtract the average from each value and square the resulting difference (you get squared difference).
  • The next step is to calculate the arithmetic mean of the resulting squared differences (You can find out why exactly the squares below).

Let's look at an example. Let's say you and your friends decide to measure the height of your dogs (in millimeters). As a result of the measurements, you received the following height measurements (at the withers): 600 mm, 470 mm, 170 mm, 430 mm and 300 mm.

Let's calculate the mean, variance and standard deviation.

First let's find the average value. As you already know, to do this you need to add up all the measured values ​​and divide by the number of measurements. Calculation progress:

Average mm.

So, the average (arithmetic mean) is 394 mm.

Now we need to determine deviation of the height of each dog from the average:

Finally, to calculate variance, we square each of the resulting differences, and then find the arithmetic mean of the results obtained:

Dispersion mm 2 .

Thus, the dispersion is 21704 mm 2.

How to find standard deviation

So how can we now calculate the standard deviation, knowing the variance? As we remember, take the square root of it. That is, the standard deviation is equal to:

Mm (rounded to the nearest whole number in mm).

Using this method, we found that some dogs (for example, Rottweilers) are very big dogs. But there are also very small dogs (for example, dachshunds, but you shouldn’t tell them that).

The most interesting thing is that the standard deviation carries with it useful information. Now we can show which of the obtained height measurement results are within the interval that we get if we plot the standard deviation from the average (to both sides of it).

That is, using the standard deviation, we obtain a “standard” method that allows us to find out which of the values ​​is normal (statistical average), and which is extraordinarily large or, conversely, small.

What is standard deviation

But... everything will be a little different if we analyze sample data. In our example we considered general population. That is, our 5 dogs were the only dogs in the world that interested us.

But if the data is a sample (values ​​selected from a large population), then the calculations need to be done differently.

If there are values, then:

All other calculations are carried out similarly, including the determination of the average.

For example, if our five dogs are just a sample of the population of dogs (all dogs on the planet), we must divide by 4, not 5, namely:

Sample variance = mm 2.

In this case, the standard deviation for the sample is equal to mm (rounded to the nearest whole number).

We can say that we have made some “correction” in the case where our values ​​are just a small sample.

Note. Why exactly squared differences?

But why do we take exactly the squared differences when calculating the variance? Let's say when measuring some parameter, you received the following set of values: 4; 4; -4; -4. If we simply add the absolute deviations from the mean (differences) together... the negative values ​​cancel out with the positive ones:

.

It turns out that this option is useless. Then maybe it’s worth trying the absolute values ​​of the deviations (that is, the modules of these values)?

At first glance, it turns out well (the resulting value, by the way, is called the mean absolute deviation), but not in all cases. Let's try another example. Let the measurement result in the following set of values: 7; 1; -6; -2. Then the average absolute deviation is:

Wow! Again we got a result of 4, although the differences have a much larger spread.

Now let's see what happens if we square the differences (and then take the square root of their sum).

For the first example it will be:

.

For the second example it will be:

Now it’s a completely different matter! The greater the spread of the differences, the greater the standard deviation is... which is what we were aiming for.

In fact, in this method The same idea is used as when calculating the distance between points, only applied in a different way.

And from a mathematical point of view, the use of squares and square roots provides more benefit than we could get from absolute values ​​of deviations, making standard deviation applicable to other mathematical problems.

Sergey Valerievich told you how to find the standard deviation

Lesson No. 4

Topic: “Descriptive statistics. Indicators of trait diversity in the aggregate"

The main criteria for the diversity of a characteristic in a statistical population are: limit, amplitude, standard deviation, coefficient of oscillation and coefficient of variation. In the previous lesson, it was discussed that average values ​​provide only a generalized characteristic of the characteristic being studied in the aggregate and do not take into account the values ​​of its individual variants: minimum and maximum values, above average, below average, etc.

Example. Average values ​​of two different number sequences: -100; -20; 100; 20 and 0.1; -0.2; 0.1 are absolutely identical and equalABOUT.However, the scatter ranges of these relative mean sequence data are very different.

The determination of the listed criteria for the diversity of a characteristic is primarily carried out taking into account its value in individual elements statistical population.

Indicators for measuring variation of a trait are absolute And relative. Absolute indicators of variation include: range of variation, limit, standard deviation, dispersion. The coefficient of variation and the coefficient of oscillation refer to relative measures of variation.

Limit (lim)– This is a criterion that is determined by the extreme values ​​of a variant in a variation series. In other words, this criterion is limited by the minimum and maximum values ​​of the attribute:

Amplitude (Am) or range of variation – This is the difference between the extreme options. The calculation of this criterion is carried out by subtracting its minimum value from the maximum value of the attribute, which allows us to estimate the degree of scatter of the option:

The disadvantage of limit and amplitude as criteria of variability is that they completely depend on the extreme values ​​of the characteristic in the variation series. In this case, fluctuations in attribute values ​​within a series are not taken into account.

The most complete description of the diversity of a trait in a statistical population is provided by standard deviation(sigma), which is a general measure of the deviation of an option from its average value. Standard deviation is often called standard deviation.

The standard deviation is based on a comparison of each option with the arithmetic mean of a given population. Since in the aggregate there will always be options both less and more than it, the sum of deviations with the sign "" will be canceled out by the sum of deviations with the sign "", i.e. the sum of all deviations is zero. In order to avoid the influence of the signs of the differences, deviations from the arithmetic mean squared are taken, i.e. . The sum of squared deviations does not equal zero. To obtain a coefficient that can measure variability, take the average of the sum of squares - this value is called variances:

In essence, dispersion is the average square of deviations of individual values ​​of a characteristic from its average value. Dispersion square of the standard deviation.

Variance is a dimensional quantity (named). So, if the variants of a number series are expressed in meters, then the variance gives square meters; if the options are expressed in kilograms, then the variance gives the square of this measure (kg 2), etc.

Standard deviation– square root of variance:

, then when calculating the dispersion and standard deviation in the denominator of the fraction, instead ofmust be put.

The calculation of the standard deviation can be divided into six stages, which must be carried out in a certain sequence:

Application of standard deviation:

a) for judging the variability of variation series and comparative assessment of the typicality (representativeness) of arithmetic averages. This is necessary in differential diagnosis when determining the stability of symptoms.

b) to reconstruct the variation series, i.e. restoration of its frequency response based on three sigma rules. In the interval (М±3σ) 99.7% of all variants of the series are located in the interval (М±2σ) - 95.5% and in the range (М±1σ) - 68.3% row option(Fig. 1).

c) to identify “pop-up” options

d) to determine the parameters of norm and pathology using sigma estimates

e) to calculate the coefficient of variation

f) to calculate the average error of the arithmetic mean.

To characterize any population that hasnormal distribution type , it is enough to know two parameters: the arithmetic mean and the standard deviation.

Figure 1. Three Sigma rule

Example.

In pediatrics, standard deviation is used to assess the physical development of children by comparing the data of a particular child with the corresponding standard indicators. The arithmetic average of the physical development of healthy children is taken as the standard. Comparison of indicators with standards is carried out using special tables in which the standards are given along with their corresponding sigma scales. It is believed that if the child’s physical development indicator is within the standard (arithmetic mean) ±σ, then physical development the child (according to this indicator) corresponds to the norm. If the indicator is within the standard ±2σ, then there is a slight deviation from the norm. If the indicator goes beyond these limits, then the child’s physical development differs sharply from the norm (pathology is possible).

In addition to variation indicators expressed in absolute values, statistical research uses variation indicators expressed in relative values. Oscillation coefficient - this is the ratio of the range of variation to the average value of the trait. The coefficient of variation - this is the ratio of the standard deviation to the average value of the characteristic. Typically, these values ​​are expressed as percentages.

Formulas for calculating relative variation indicators:

From the above formulas it is clear that the greater the coefficient V is closer to zero, the smaller the variation in the values ​​of the characteristic. The more V, the more variable the sign.

In statistical practice, the coefficient of variation is most often used. It is used not only for a comparative assessment of variation, but also to characterize the homogeneity of the population. The population is considered homogeneous if the coefficient of variation does not exceed 33% (for distributions close to normal). Arithmetically, the ratio of σ and the arithmetic mean neutralizes the influence of the absolute value of these characteristics, and the percentage ratio makes the coefficient of variation a dimensionless (unnamed) value.

The resulting value of the coefficient of variation is estimated in accordance with the approximate gradations of the degree of diversity of the trait:

Weak - up to 10%

Average - 10 - 20%

Strong - more than 20%

The use of the coefficient of variation is advisable in cases where it is necessary to compare characteristics that are different in size and dimension.

The difference between the coefficient of variation and other scatter criteria is clearly demonstrated example.

Table 1

Composition of industrial enterprise workers

Based on the statistical characteristics given in the example, we can draw a conclusion about the relative homogeneity of the age composition and educational level of the enterprise’s employees, given the low professional stability of the surveyed contingent. It is easy to see that an attempt to judge these social trends by the standard deviation would lead to an erroneous conclusion, and an attempt to compare the accounting characteristics “work experience” and “age” with the accounting indicator “education” would generally be incorrect due to the heterogeneity of these characteristics.

Median and percentiles

For ordinal (rank) distributions, where the criterion for the middle of the series is the median, the standard deviation and dispersion cannot serve as characteristics of the dispersion of the variant.

The same is true for open variation series. This circumstance is due to the fact that the deviations from which variance and σ are calculated are measured from the arithmetic mean, which is not calculated in open variation series and in series of distributions of qualitative characteristics. Therefore, for a compressed description of distributions, another scatter parameter is used - quantile(synonym - “percentile”), suitable for describing qualitative and quantitative characteristics in any form of their distribution. This parameter can also be used to convert quantitative characteristics into qualitative ones. In this case, such ratings are assigned depending on which order of quantile a particular option corresponds to.

In the practice of biomedical research, the following quantiles are most often used:

– median;

, – quartiles (quarters), where – lower quartile, top quartile.

Quantiles divide the area of ​​possible changes in a variation series into certain intervals. Median (quantile) is an option that is in the middle of a variation series and divides this series in half into two equal parts ( 0,5 And 0,5 ). A quartile divides a series into four parts: the first part (lower quartile) is an option that separates options whose numerical values ​​do not exceed 25% of the maximum possible in a given series; a quartile separates options with a numerical value of up to 50% of the maximum possible. The upper quartile () separates options up to 75% of the maximum possible values.

In case of asymmetric distribution variable relative to the arithmetic mean, the median and quartiles are used to characterize it. In this case, the following form of displaying the average value is used - Meh (;). For example, the studied feature – “the period at which the child began to walk independently” – has an asymmetric distribution in the study group. At the same time, the lower quartile () corresponds to the start of walking - 9.5 months, the median - 11 months, the upper quartile () - 12 months. Accordingly, the characteristic of the average trend of the specified attribute will be presented as 11 (9.5; 12) months.

Assessing the statistical significance of the study results

The statistical significance of data is understood as the degree to which it corresponds to the displayed reality, i.e. statistically significant data are those that do not distort and correctly reflect objective reality.

Assessing the statistical significance of the research results means determining with what probability it is possible to transfer the results obtained from the sample population to the entire population. Assessing statistical significance is necessary to understand how much of a phenomenon can be used to judge the phenomenon as a whole and its patterns.

The assessment of the statistical significance of the research results consists of:

1. errors of representativeness (errors of average and relative values) - m;

2. confidence limits of average or relative values;

3. reliability of the difference in average or relative values ​​according to the criterion t.

Standard error of the arithmetic mean or representativeness error characterizes the fluctuations of the average. It should be noted that the larger the sample size, the smaller the spread of average values. The standard error of the mean is calculated using the formula:

In modern scientific literature, the arithmetic mean is written together with the representativeness error:

or together with standard deviation:

As an example, consider data on 1,500 city clinics in the country (general population). The average number of patients served in the clinic is 18,150 people. Random selection of 10% of sites (150 clinics) gives an average number of patients equal to 20,051 people. The sampling error, obviously due to the fact that not all 1500 clinics were included in the sample, is equal to the difference between these averages - the general average ( M gene) and sample mean ( M selected). If we form another sample of the same size from our population, it will give a different error value. All these sample means, with sufficiently large samples, are distributed normally around the general mean with a sufficiently large number of repetitions of the sample of the same number of objects from the general population. Standard error of the mean m- this is the inevitable spread of sample means around the general mean.

In the case when the research results are presented in relative quantities (for example, percentages) - calculated standard error of fraction:

where P is the indicator in %, n is the number of observations.

The result is displayed as (P ± m)%. For example, the percentage of recovery among patients was (95.2±2.5)%.

In the event that the number of elements of the population, then when calculating the standard errors of the mean and the fraction in the denominator of the fraction, instead ofmust be put.

For a normal distribution (the distribution of sample means is normal), we know what portion of the population falls within any interval around the mean. In particular:

In practice, the problem is that the characteristics of the general population are unknown to us, and the sample is made precisely for the purpose of estimating them. This means that if we make samples of the same size n from the general population, then in 68.3% of cases the interval will contain the value M(in 95.5% of cases it will be on the interval and in 99.7% of cases – on the interval).

Since only one sample is actually taken, this statement is formulated in terms of probability: with a probability of 68.3%, the average value of the attribute in the population lies in the interval, with a probability of 95.5% - in the interval, etc.

In practice, an interval is built around the sample value such that, with a given (sufficiently high) probability, confidence probability – would “cover” the true value of this parameter in the general population. This interval is called confidence interval.

Confidence probabilityP this is the degree of confidence that the confidence interval will actually contain the true (unknown) value of the parameter in the population.

For example, if the confidence probability R is 90%, this means that 90 samples out of 100 will give the correct estimate of the parameter in the population. Accordingly, the probability of error, i.e. incorrect estimate of the general average for the sample is equal in percentage: . For this example, this means that 10 samples out of 100 will give an incorrect estimate.

Obviously, the degree of confidence (confidence probability) depends on the size of the interval: the wider the interval, the higher the confidence that an unknown value for the population will fall into it. In practice, at least twice the sampling error is used to construct a confidence interval to provide at least 95.5% confidence.

Determining the confidence limits of averages and relative values ​​allows us to find their two extreme values ​​- the minimum possible and the maximum possible, within which the studied indicator can occur in the entire population. Based on this, confidence limits (or confidence interval)- these are the boundaries of average or relative values, beyond which due to random fluctuations there is an insignificant probability.

The confidence interval can be rewritten as: , where t– confidence criterion.

The confidence limits of the arithmetic mean in the population are determined by the formula:

M gene = M select + t m M

for relative value:

R gene = P select + t m R

Where M gene And R gene- values ​​of average and relative values ​​for the general population; M select And R select- values ​​of average and relative values ​​obtained from the sample population; m M And m P- errors of average and relative values; t- confidence criterion (accuracy criterion, which is established when planning the study and can be equal to 2 or 3); t m- this is a confidence interval or Δ - the maximum error of the indicator obtained in a sample study.

It should be noted that the value of the criterion t to a certain extent related to the probability of an error-free forecast (p), expressed in %. It is chosen by the researcher himself, guided by the need to obtain the result with the required degree of accuracy. Thus, for the probability of an error-free forecast of 95.5%, the value of the criterion t is 2, for 99.7% - 3.

The given confidence interval estimates are acceptable only for statistical populations with more than 30 observations. With a smaller population size (small samples), special tables are used to determine the t criterion. In these tables, the desired value is located at the intersection of the line corresponding to the size of the population (n-1), and a column corresponding to the probability level of an error-free forecast (95.5%; 99.7%) chosen by the researcher. In medical research, when establishing confidence limits for any indicator, the probability of an error-free forecast is 95.5% or more. This means that the value of the indicator obtained from the sample population must be found in the general population in at least 95.5% of cases.

    Questions on the topic of the lesson:

    Relevance of indicators of trait diversity in a statistical population.

    General characteristics of absolute variation indicators.

    Standard deviation, calculation, application.

    Relative measures of variation.

    Median, quartile score.

    Assessing the statistical significance of study results.

    Standard error of the arithmetic mean, calculation formula, example of use.

    Calculation of the proportion and its standard error.

    The concept of confidence probability, an example of use.

10. The concept of a confidence interval, its application.

    Test tasks on the topic with standard answers:

1. ABSOLUTE INDICATORS OF VARIATION REFER TO

1) coefficient of variation

2) oscillation coefficient

4) median

2. RELATIVE INDICATORS OF VARIATION RELATE

1) dispersion

4) coefficient of variation

3. CRITERION WHICH IS DETERMINED BY THE EXTREME VALUES OF AN OPTION IN A VARIATION SERIES

2) amplitude

3) dispersion

4) coefficient of variation

4. THE DIFFERENCE OF EXTREME OPTIONS IS

2) amplitude

3) standard deviation

4) coefficient of variation

5. THE AVERAGE SQUARE OF DEVIATIONS OF INDIVIDUAL VALUES OF A CHARACTERISTIC FROM ITS AVERAGE VALUES IS

1) oscillation coefficient

2) median

3) dispersion

6. THE RATIO OF THE SCALE OF VARIATION TO THE AVERAGE VALUE OF A CHARACTER IS

1) coefficient of variation

2) standard deviation

4) oscillation coefficient

7. THE RATIO OF THE AVERAGE SQUARE DEVIATION TO THE AVERAGE VALUE OF A CHARACTERISTIC IS

1) dispersion

2) coefficient of variation

3) oscillation coefficient

4) amplitude

8. THE OPTION THAT IS IN THE MIDDLE OF THE VARIATION SERIES AND DIVIDES IT INTO TWO EQUAL PARTS IS

1) median

3) amplitude

9. IN MEDICAL RESEARCH, WHEN ESTABLISHING CONFIDENCE LIMITS FOR ANY INDICATOR, THE PROBABILITY OF AN ERROR-FREE PREDICTION IS ACCEPTED

10. IF 90 SAMPLES OUT OF 100 GIVE THE CORRECT ESTIMATE OF A PARAMETER IN THE POPULATION, THIS MEANS THAT THE CONFIDENCE PROBABILITY P EQUAL

11. IF 10 SAMPLES OUT OF 100 GIVE AN INCORRECT ESTIMATE, THE PROBABILITY OF ERROR IS EQUAL

12. LIMITS OF AVERAGE OR RELATIVE VALUES, GOING BEYOND WHICH DUE TO RANDOM OSCILLATIONS HAS AN INsignificant PROBABILITY – THIS IS

1) confidence interval

2) amplitude

4) coefficient of variation

13. A SMALL SAMPLE IS CONSIDERED THAT POPULATION IN WHICH

1) n is less than or equal to 100

2) n is less than or equal to 30

3) n is less than or equal to 40

4) n is close to 0

14. FOR THE PROBABILITY OF AN ERROR-FREE FORECAST 95% CRITERION VALUE t IS

15. FOR THE PROBABILITY OF AN ERROR-FREE FORECAST 99% CRITERION VALUE t IS

16. FOR DISTRIBUTIONS CLOSE TO NORMAL, THE POPULATION IS CONSIDERED HOMOGENEOUS IF THE COEFFICIENT OF VARIATION DOES NOT EXCEED

17. OPTION, SEPARATING OPTIONS, THE NUMERICAL VALUES OF WHICH DO NOT EXCEED 25% OF THE MAXIMUM POSSIBLE IN A GIVEN SERIES – THIS IS

2) lower quartile

3) upper quartile

4) quartile

18. DATA THAT DOES NOT DISTORT AND CORRECTLY REFLECTS OBJECTIVE REALITY IS CALLED

1) impossible

2) equally possible

3) reliable

4) random

19. ACCORDING TO THE RULE OF "THREE Sigma", WITH NORMAL DISTRIBUTION OF A CHARACTERISTIC WITHIN
WILL BE LOCATED

1) 68.3% option

Standard deviation(synonyms: standard deviation, standard deviation, square deviation; related terms: standard deviation, standard spread) - in probability theory and statistics, the most common indicator of the dispersion of the values ​​of a random variable relative to its mathematical expectation. For limited arrays of value samples, instead of mathematical expectation the arithmetic mean of the sample population is used.

Encyclopedic YouTube

  • 1 / 5

    Standard deviation is measured in units of measurement itself random variable and is used when calculating the standard error of the arithmetic mean, when constructing confidence intervals, when statistically testing hypotheses, when measuring the linear relationship between random variables. Defined as the square root of the variance of a random variable.

    Standard deviation:

    s = n n − 1 σ 2 = 1 n − 1 ∑ i = 1 n (x i − x ¯) 2 ; (\displaystyle s=(\sqrt ((\frac (n)(n-1))\sigma ^(2)))=(\sqrt ((\frac (1)(n-1))\sum _( i=1)^(n)\left(x_(i)-(\bar (x))\right)^(2)));)
    • Note: Very often there are discrepancies in the names of MSD (Root Mean Square Deviation) and STD (Standard Deviation) with their formulas. For example, in the numPy module of the Python programming language, the std() function is described as "standard deviation", while the formula reflects the standard deviation (division by the root of the sample). In Excel, the STANDARDEVAL() function is different (division by the root of n-1).

    Standard deviation(estimate of the standard deviation of a random variable x relative to its mathematical expectation based on an unbiased estimate of its variance) s (\displaystyle s):

    σ = 1 n ∑ i = 1 n (x i − x ¯) 2 . (\displaystyle \sigma =(\sqrt ((\frac (1)(n))\sum _(i=1)^(n)\left(x_(i)-(\bar (x))\right) ^(2))).)

    Where σ 2 (\displaystyle \sigma ^(2))- dispersion; x i (\displaystyle x_(i)) - i th element of the selection; n (\displaystyle n)- sample size; - arithmetic mean of the sample:

    x ¯ = 1 n ∑ i = 1 n x i = 1 n (x 1 + … + x n) . (\displaystyle (\bar (x))=(\frac (1)(n))\sum _(i=1)^(n)x_(i)=(\frac (1)(n))(x_ (1)+\ldots +x_(n)).)

    It should be noted that both estimates are biased. In the general case, it is impossible to construct an unbiased estimate. However, the estimate based on the unbiased variance estimate is consistent.

    In accordance with GOST R 8.736-2011, the standard deviation is calculated using the second formula of this section. Please check the results.

    Three sigma rule

    Three sigma rule (3 σ (\displaystyle 3\sigma )) - almost all values ​​of a normally distributed random variable lie in the interval (x ¯ − 3 σ ; x ¯ + 3 σ) (\displaystyle \left((\bar (x))-3\sigma ;(\bar (x))+3\sigma \right)). More strictly - with approximately probability 0.9973, the value of a normally distributed random variable lies in the specified interval (provided that the value x ¯ (\displaystyle (\bar (x))) true, and not obtained as a result of sample processing).

    If the true value x ¯ (\displaystyle (\bar (x))) is unknown, then you should not use σ (\displaystyle \sigma ), A s. Thus, rule of three sigma is converted to the rule of three s .

    Interpretation of the standard deviation value

    A larger value of the standard deviation shows a greater spread of values ​​in the presented set with average size multitudes; a smaller value, accordingly, shows that the values ​​in the set are grouped around the average value.

    For example, we have three number sets: (0, 0, 14, 14), (0, 6, 8, 14) and (6, 6, 8, 8). For all three sets, the average values ​​are equal to 7, and the standard deviations, respectively, are equal to 7, 5 and 1. last set the standard deviation is small, since the values ​​in the set are grouped around the average value; the first set has the most great importance standard deviation - values ​​within the set diverge greatly from the average value.

    In a general sense, standard deviation can be considered a measure of uncertainty. For example, in physics, standard deviation is used to determine the error of a series of successive measurements of some quantity. This value is very important for determining the plausibility of the phenomenon under study in comparison with the value predicted by the theory: if the average value of the measurements differs greatly from the values ​​​​predicted by the theory (large standard deviation), then the obtained values ​​or the method of obtaining them should be rechecked. identified with portfolio risk.

    Climate

    Suppose there are two cities with the same average maximum daily temperature, but one is located on the coast and the other on the plain. It is known that cities located on the coast have many different maximum daytime temperatures that are lower than cities located inland. Therefore, the standard deviation of the maximum daily temperatures for a coastal city will be less than for the second city, despite the fact that the average value of this value is the same, which in practice means that the probability that the maximum air temperature on any given day of the year will be higher differ from the average value, higher for a city located inland.

    Sport

    Let's assume that there are several football teams that are evaluated according to some set of parameters, for example, the number of goals scored and conceded, scoring chances, etc. It is most likely that the best team in this group will have best values By more parameters. The smaller the team’s standard deviation for each of the presented parameters, the more predictable the team’s result is; such teams are balanced. On the other hand, for a team with a large standard deviation, it is difficult to predict the result, which in turn is explained by the imbalance, e.g. strong defense, but with a weak attack.

    Using the standard deviation of team parameters makes it possible, to one degree or another, to predict the result of a match between two teams, assessing the strengths and weak sides commands, and therefore the chosen methods of struggle.

Editor's Choice
In recent years, the bodies and troops of the Russian Ministry of Internal Affairs have been performing service and combat missions in a difficult operational environment. Wherein...

Members of the St. Petersburg Ornithological Society adopted a resolution on the inadmissibility of removal from the Southern Coast...

Russian State Duma deputy Alexander Khinshtein published photographs of the new “chief cook of the State Duma” on his Twitter. According to the deputy, in...

Home Welcome to the site, which aims to make you as healthy and beautiful as possible! Healthy lifestyle in...
The son of moral fighter Elena Mizulina lives and works in a country with gay marriages. Bloggers and activists called on Nikolai Mizulin...
Purpose of the study: With the help of literary and Internet sources, find out what crystals are, what science studies - crystallography. To know...
WHERE DOES PEOPLE'S LOVE FOR SALTY COME FROM? The widespread use of salt has its reasons. Firstly, the more salt you consume, the more you want...
The Ministry of Finance intends to submit a proposal to the government to expand the experiment on taxation of the self-employed to include regions with high...
To use presentation previews, create a Google account and sign in:...