Measures of Location and Spread

All Questions
A company conducted a survey of 20 users from regions A and B respectively to understand user satisfaction with its products. The satisfaction scores are as follows:
Region A: 62, 73, 81, 92, 95, 85, 74, 64, 53, 76, 78, 86, 95, 66, 97, 78, 88, 82, 76, 89
Region B: 73, 83, 62, 51, 91, 46, 53, 73, 64, 82, 93, 48, 65, 81, 74, 56, 54, 76, 65, 79
(I) Complete the stem-and-leaf plot for user satisfaction scores in both regions based on the two sets of data, and compare the mean and dispersion of satisfaction scores between the two regions through the stem-and-leaf plot (no need to calculate exact values, just draw conclusions);
(II) Based on user satisfaction scores, classify user satisfaction into three levels from low to high:
Satisfaction Score: Below 70, 70 to 89, At least 90 Satisfaction Level: Dissatisfied, Satisfied, Very Satisfied
Let event $C$: ``The satisfaction level of users in region A is higher than that of users in region B''. Assume the evaluation results from the two regions are independent. Based on the given data, using the frequency of event occurrence as the probability of the corresponding event, find the probability of event $C$.
gaokao 2017 Q2 5 marks
To evaluate the planting effectiveness of a crop, $n$ plots of experimental land were selected. The per-acre yields (in kg) of these $n$ plots are $x_1, x_2, \cdots, x_n$ respectively. Among the following indicators, which can be used to evaluate the stability of this crop's per-acre yield?
A. Mean of $x_1, x_2, \cdots, x_n$
B. Median of $x_1, x_2, \cdots, x_n$
C. Maximum value of $x_1, x_2, \cdots, x_n$
D. Standard deviation of $x_1, x_2, \cdots, x_n$
gaokao 2018 Q18 12 marks
To improve production efficiency, a factory conducted technological innovation activities and proposed two new production methods for completing a production task. To compare the efficiency of the two methods, 40 workers were selected and randomly divided into two groups of 20 each. The first group used the first production method, and the second group used the second production method. Based on the time (in minutes) taken by workers to complete the production task, a stem-and-leaf plot was drawn.
(1) Based on the stem-and-leaf plot, which production method has higher efficiency? Explain your reasoning.
(2) Find the median $m$ of the time taken by all 40 workers to complete the production task, and fill in the contingency table with the number of workers whose completion time exceeds $m$ and does not exceed $m$ for each production method.
gaokao 2019 Q5 5 marks
A speech competition has 9 judges who each give an original score for a contestant. When determining the contestant's final score, the highest and lowest scores are removed from the 9 original scores, leaving 7 valid scores. Compared with the 9 original scores, the numerical characteristic that remains unchanged for the 7 valid scores is
A. median
B. mean
C. variance
D. range
13. A school will select one person from three candidates (A, B, C) to participate in the city-wide middle school boys' 1500-meter race. The mean and variance of their 10 recent training times (in seconds) are shown in the following table:
ABC
Mean280280290
Variance201616

Based on the data in the table, the school should select \_\_\_\_ to participate in the race.
13. China's high-speed rail development is rapid and technologically advanced. According to statistics, among high-speed trains stopping at a certain station, 10 trains have an on-time rate of 0.97, 20 trains have an on-time rate of 0.98, and 10 trains have an on-time rate of 0.99. The estimated value of the average on-time rate for all high-speed trains stopping at this station is $\_\_\_\_$ .
14. China's high-speed rail development is rapid and technologically advanced. According to statistics, among high-speed trains stopping at a certain station, 10 trains have an on-time rate of 0.97, 20 trains have an on-time rate of 0.98, and 10 trains have an on-time rate of 0.99. The estimated value of the average on-time rate for all trains stopping at this station is $\_\_\_\_$ .
gaokao 2020 Q3 5 marks
For a sample of data $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$ with variance 0.01, the variance of the data $10 x _ { 1 } , 10 x _ { 2 } , \cdots , 10 x _ { n }$ is
A. 0.01
B. 0.1
C. 1
D. 10
gaokao 2020 Q3 5 marks
In a sample of data, the frequencies of $1,2,3,4$ are $p _ { 1 } , p _ { 2 } , p _ { 3 } , p _ { 4 }$ respectively, and $\sum _ { i = 1 } ^ { 4 } p _ { i } = 1$ . Among the following four cases, the one with the largest standard deviation is
A. $p _ { 1 } = p _ { 4 } = 0.1 , p _ { 2 } = p _ { 3 } = 0.4$
B. $p _ { 1 } = p _ { 4 } = 0.4 , p _ { 2 } = p _ { 3 } = 0.1$
C. $p _ { 1 } = p _ { 4 } = 0.2 , p _ { 2 } = p _ { 3 } = 0.3$
D. $p _ { 1 } = p _ { 4 } = 0.3 , p _ { 2 } = p _ { 3 } = 0.2$
gaokao 2020 Q6 4 marks
Given that the median of $a, b, 1, 2$ is 3 and the mean is 4, find $ab =$ $\_\_\_\_$
9. CD
Solution: Based on the formulas for calculating the mean, median, standard deviation, and range of sample data, options $C$ and $D$ are correct.
9. Which of the following statistics can measure the dispersion of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$? ( )
A. The standard deviation of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
B. The median of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
C. The range of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
D. The mean of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
【Answer】AC 【Solution】 【Analysis】Determine which of the given options measure data dispersion and which measure central tendency.
【Detailed Solution】By the definition of standard deviation, standard deviation measures data dispersion. By the definition of median, median measures central tendency. By the definition of range, range measures data dispersion. By the definition of mean, mean measures central tendency. Therefore, the answer is: AC. [Detailed Solution] From the given conditions, $f ( x ) = \left| e ^ { x } - 1 \right| = \left\{ \begin{array} { l } 1 - e ^ { x } , x < 0 \\ e ^ { x } - 1 , x \geq 0 \end{array} \right.$ , then $f ^ { \prime } ( x ) = \left\{ \begin{array} { l } - e ^ { x } , x < 0 \\ e ^ { x } , x > 0 \end{array} \right.$ , Thus point $A \left( x _ { 1 } , 1 - e ^ { x _ { 1 } } \right)$ and point $B \left( x _ { 2 } , e ^ { x _ { 2 } } - 1 \right)$ , $k _ { A M } = - e ^ { x _ { 1 } } , k _ { B N } = e ^ { x _ { 2 } }$ , Therefore $- e ^ { x _ { 1 } } \cdot e ^ { x _ { 2 } } = - 1 , x _ { 1 } + x _ { 2 } = 0$ , So $AM: y - 1 + e ^ { x _ { 1 } } = - e ^ { x _ { 1 } } \left( x - x _ { 1 } \right) , M \left( 0 , e ^ { x _ { 1 } } x _ { 1 } - e ^ { x _ { 1 } } + 1 \right)$ , Thus $| A M | = \sqrt { x _ { 1 } ^ { 2 } + \left( e ^ { x _ { 1 } } x _ { 1 } \right) ^ { 2 } } = \sqrt { 1 + e ^ { 2 x _ { 1 } } } \cdot \left| x _ { 1 } \right|$ , Similarly $| B N | = \sqrt { 1 + e ^ { 2 x _ { 2 } } } \cdot \left| x _ { 2 } \right|$ , Therefore $\frac { | A M | } { | B N | } = \frac { \sqrt { 1 + e ^ { 2 x _ { 1 } } } \cdot \left| x _ { 1 } \right| } { \sqrt { 1 + e ^ { 2 x _ { 2 } } } \cdot \left| x _ { 2 } \right| } = \sqrt { \frac { 1 + e ^ { 2 x _ { 1 } } } { 1 + e ^ { 2 x _ { 2 } } } } = \sqrt { \frac { 1 + e ^ { 2 x _ { 1 } } } { 1 + e ^ { - 2 x _ { 1 } } } } = e ^ { x _ { 1 } } \in ( 0,1 )$ . Thus the answer is: $( 0,1 )$ [Key Point Explanation] The key to solving this problem is to use the geometric meaning of the derivative to transform the condition $x _ { 1 } + x _ { 2 } = 0$ , and after eliminating one variable, the calculation yields the solution.
IV. Answer Questions: This section contains 6 questions totaling 70 points. Solutions should include written explanations, proofs, or calculation steps.
gaokao 2022 Q9 5 marks
Which of the following statistics can measure the dispersion of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$?
A. The standard deviation of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
B. The median of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
C. The range of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
D. (option D not fully provided in source)
gaokao 2025 Q1 5 marks
The mean of the sample data 2, 8, 14, 16, 20 is ( )
A. 8
B. 9
C. 12
D. 18
The random variable $Z$, which for a fair coin describes the number of occurrences of ``heads'' in four rolls, also has the expected value 2 and analogously $P ( Z = 2 ) = \frac { 3 } { 8 }$. Calculate the variance of $Z$, compare it with the variance of $Y$, and based on this describe a qualitative difference in the probability distributions of $Z$ and $Y$.
Using the table of values, determine the smallest interval symmetric about the expected value in which the values of the random variable $X$ lie with a probability of at least $75 \%$.
The random variable $X$ describes the number of cars with electric motors among the selected vehicles. Calculate the expected value and the standard deviation of $X$.
For a certain value $n \in \{ 1 ; 2 ; 3 ; \ldots \}$, the binomially distributed random variables $Z _ { p }$ with parameters $n$ and $p$ are considered for $p \in ] 0 ; 1 [$. Show that among these random variables, the one with $p = 0.5$ has the greatest variance.
To assess whether a machine works well, the mean of the filling quantity and the spread of the filling quantity are considered. A machine works better the closer the filling is on average to the value 330 ml and the smaller the spread is.
For the sample from Machine $A$, the mean of the filling quantity is 330 ml and the standard deviation is approximately $1.34 \mathrm { ml }$.
Investigate based on the samples which of the two machines works better.
Let $Y$ be a random variable taking values in $\mathbb{N}$ almost surely, and which admits an expectation. Show that $$\mathbb{E}(Y) = \sum_{k=1}^{+\infty} \mathbb{P}(Y \geqslant k)$$
Suppose that $X$ admits a second moment. Let $\delta$ be an element of $\mathbb{R}^{+*}$. Show that, for $n$ in $\mathbb{N}^{*}$, $$P\left(\left|S_{n} - nE(X)\right| \geqslant n\delta\right) \leqslant \frac{V(X)}{n\delta^{2}}$$
Let $m$ be a measure. Let $f : \mathbb { R } \rightarrow \mathbb { R }$ be a function that admits a variance relative to $m$. Show that $fm$ is integrable. As a consequence, the real $$\operatorname { Var } _ { m } ( f ) = \int f ( x ) ^ { 2 } m ( x ) d x - \left( \int f ( x ) m ( x ) d x \right) ^ { 2 }$$ is well defined. Show that $\operatorname { Var } _ { m } ( f ) \geqslant 0$.
Let $m$ be a measure. Let $f : \mathbb { R } \rightarrow \mathbb { R }$ be a function that admits an entropy relative to $m$. We consider the function $h : [ 0 , + \infty [ \rightarrow \mathbb { R }$ defined by $h ( 0 ) = 0$ and for $x > 0$, $h ( x ) = x \ln ( x )$.
2a. Show that $f ^ { 2 } m$ is integrable. As a consequence, the real $$\operatorname { Ent } _ { m } ( f ) = \int h \left( f ( x ) ^ { 2 } \right) m ( x ) d x - h \left( \int f ( x ) ^ { 2 } m ( x ) d x \right)$$ is well defined.
2b. Let $a > 0$. Show that $$\forall x \geqslant 0 , \quad h ( x ) \geqslant ( x - a ) h ^ { \prime } ( a ) + h ( a ) ,$$ with strict inequality if $x \neq a$.
2c. Show that $\operatorname { Ent } _ { m } ( f ) \geqslant 0$. You may use the previous question with $a = \int f ( x ) ^ { 2 } m ( x ) d x$.
2d. We assume here that for all $x \in \mathbb { R } , m ( x ) > 0$. Characterize the functions $f$ such that $\operatorname { Ent } _ { m } ( f ) = 0$.
Let $\left(X_{n}\right)_{n \in \mathbb{N}}$ be a sequence of mutually independent Rademacher random variables and $S_{n} = \sum_{j=1}^{n} X_{j}$. For $x$ real, $\lfloor x \rfloor$ denotes the integer part of $x$. For all real numbers $\delta > 0$ and $\tau > 0$, calculate $\mathbb{V}\left(\delta S_{\lfloor 1/\tau \rfloor}\right)$, the variance of the random variable $\delta S_{\lfloor 1/\tau \rfloor}$.
Let $\left(X_{n}\right)_{n \in \mathbb{N}}$ be a sequence of mutually independent Rademacher random variables and $S_{n} = \sum_{j=1}^{n} X_{j}$. Show that, for every real number $\delta$, $\mathbb{V}\left(\delta S_{\lfloor 1/\tau \rfloor}\right)$ is equivalent to $\frac{\delta^{2}}{\tau}$, as $\tau$ tends to 0 from above.