LFM Stats And Pure > Measures of Location and Spread

gaokao 2015 Q18

A company conducted a survey of 20 users from regions A and B respectively to understand user satisfaction with its products. The satisfaction scores are as follows:
Region A: 62, 73, 81, 92, 95, 85, 74, 64, 53, 76, 78, 86, 95, 66, 97, 78, 88, 82, 76, 89
Region B: 73, 83, 62, 51, 91, 46, 53, 73, 64, 82, 93, 48, 65, 81, 74, 56, 54, 76, 65, 79
(I) Complete the stem-and-leaf plot for user satisfaction scores in both regions based on the two sets of data, and compare the mean and dispersion of satisfaction scores between the two regions through the stem-and-leaf plot (no need to calculate exact values, just draw conclusions);
(II) Based on user satisfaction scores, classify user satisfaction into three levels from low to high:
Satisfaction Score: Below 70, 70 to 89, At least 90 Satisfaction Level: Dissatisfied, Satisfied, Very Satisfied
Let event $C$: ``The satisfaction level of users in region A is higher than that of users in region B''. Assume the evaluation results from the two regions are independent. Based on the given data, using the frequency of event occurrence as the probability of the corresponding event, find the probability of event $C$.

gaokao 2017 Q2 5 marks

To evaluate the planting effectiveness of a crop, $n$ plots of experimental land were selected. The per-acre yields (in kg) of these $n$ plots are $x_1, x_2, \cdots, x_n$ respectively. Among the following indicators, which can be used to evaluate the stability of this crop's per-acre yield?
A. Mean of $x_1, x_2, \cdots, x_n$
B. Median of $x_1, x_2, \cdots, x_n$
C. Maximum value of $x_1, x_2, \cdots, x_n$
D. Standard deviation of $x_1, x_2, \cdots, x_n$

gaokao 2018 Q18 12 marks

To improve production efficiency, a factory conducted technological innovation activities and proposed two new production methods for completing a production task. To compare the efficiency of the two methods, 40 workers were selected and randomly divided into two groups of 20 each. The first group used the first production method, and the second group used the second production method. Based on the time (in minutes) taken by workers to complete the production task, a stem-and-leaf plot was drawn.
(1) Based on the stem-and-leaf plot, which production method has higher efficiency? Explain your reasoning.
(2) Find the median $m$ of the time taken by all 40 workers to complete the production task, and fill in the contingency table with the number of workers whose completion time exceeds $m$ and does not exceed $m$ for each production method.

gaokao 2019 Q5 5 marks

A speech competition has 9 judges who each give an original score for a contestant. When determining the contestant's final score, the highest and lowest scores are removed from the 9 original scores, leaving 7 valid scores. Compared with the 9 original scores, the numerical characteristic that remains unchanged for the 7 valid scores is
A. median
B. mean
C. variance
D. range

gaokao 2019 Q13

13. A school will select one person from three candidates (A, B, C) to participate in the city-wide middle school boys' 1500-meter race. The mean and variance of their 10 recent training times (in seconds) are shown in the following table:

	A	B	C
Mean	280	280	290
Variance	20	16	16

Based on the data in the table, the school should select \_\_\_\_ to participate in the race.

gaokao 2019 Q13

13. China's high-speed rail development is rapid and technologically advanced. According to statistics, among high-speed trains stopping at a certain station, 10 trains have an on-time rate of 0.97, 20 trains have an on-time rate of 0.98, and 10 trains have an on-time rate of 0.99. The estimated value of the average on-time rate for all high-speed trains stopping at this station is $\_\_\_\_$ .

gaokao 2019 Q14

14. China's high-speed rail development is rapid and technologically advanced. According to statistics, among high-speed trains stopping at a certain station, 10 trains have an on-time rate of 0.97, 20 trains have an on-time rate of 0.98, and 10 trains have an on-time rate of 0.99. The estimated value of the average on-time rate for all trains stopping at this station is $\_\_\_\_$ .

gaokao 2020 Q3 5 marks

For a sample of data $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$ with variance 0.01, the variance of the data $10 x _ { 1 } , 10 x _ { 2 } , \cdots , 10 x _ { n }$ is
A. 0.01
B. 0.1
C. 1
D. 10

gaokao 2020 Q3 5 marks

In a sample of data, the frequencies of $1,2,3,4$ are $p _ { 1 } , p _ { 2 } , p _ { 3 } , p _ { 4 }$ respectively, and $\sum _ { i = 1 } ^ { 4 } p _ { i } = 1$ . Among the following four cases, the one with the largest standard deviation is
A. $p _ { 1 } = p _ { 4 } = 0.1 , p _ { 2 } = p _ { 3 } = 0.4$
B. $p _ { 1 } = p _ { 4 } = 0.4 , p _ { 2 } = p _ { 3 } = 0.1$
C. $p _ { 1 } = p _ { 4 } = 0.2 , p _ { 2 } = p _ { 3 } = 0.3$
D. $p _ { 1 } = p _ { 4 } = 0.3 , p _ { 2 } = p _ { 3 } = 0.2$

gaokao 2020 Q6 4 marks

Given that the median of $a, b, 1, 2$ is 3 and the mean is 4, find $ab =$ $\_\_\_\_$

gaokao 2021 Q9

9. CD

Solution: Based on the formulas for calculating the mean, median, standard deviation, and range of sample data, options $C$ and $D$ are correct.

gaokao 2021 Q9

9. Which of the following statistics can measure the dispersion of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$? ( )
A. The standard deviation of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
B. The median of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
C. The range of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
D. The mean of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
【Answer】AC 【Solution】【Analysis】Determine which of the given options measure data dispersion and which measure central tendency.
【Detailed Solution】By the definition of standard deviation, standard deviation measures data dispersion. By the definition of median, median measures central tendency. By the definition of range, range measures data dispersion. By the definition of mean, mean measures central tendency. Therefore, the answer is: AC. [Detailed Solution] From the given conditions, $f ( x ) = \left| e ^ { x } - 1 \right| = \left\{ \begin{array} { l } 1 - e ^ { x } , x < 0 \\ e ^ { x } - 1 , x \geq 0 \end{array} \right.$ , then $f ^ { \prime } ( x ) = \left\{ \begin{array} { l } - e ^ { x } , x < 0 \\ e ^ { x } , x > 0 \end{array} \right.$ , Thus point $A \left( x _ { 1 } , 1 - e ^ { x _ { 1 } } \right)$ and point $B \left( x _ { 2 } , e ^ { x _ { 2 } } - 1 \right)$ , $k _ { A M } = - e ^ { x _ { 1 } } , k _ { B N } = e ^ { x _ { 2 } }$ , Therefore $- e ^ { x _ { 1 } } \cdot e ^ { x _ { 2 } } = - 1 , x _ { 1 } + x _ { 2 } = 0$ , So $AM: y - 1 + e ^ { x _ { 1 } } = - e ^ { x _ { 1 } } \left( x - x _ { 1 } \right) , M \left( 0 , e ^ { x _ { 1 } } x _ { 1 } - e ^ { x _ { 1 } } + 1 \right)$ , Thus $| A M | = \sqrt { x _ { 1 } ^ { 2 } + \left( e ^ { x _ { 1 } } x _ { 1 } \right) ^ { 2 } } = \sqrt { 1 + e ^ { 2 x _ { 1 } } } \cdot \left| x _ { 1 } \right|$ , Similarly $| B N | = \sqrt { 1 + e ^ { 2 x _ { 2 } } } \cdot \left| x _ { 2 } \right|$ , Therefore $\frac { | A M | } { | B N | } = \frac { \sqrt { 1 + e ^ { 2 x _ { 1 } } } \cdot \left| x _ { 1 } \right| } { \sqrt { 1 + e ^ { 2 x _ { 2 } } } \cdot \left| x _ { 2 } \right| } = \sqrt { \frac { 1 + e ^ { 2 x _ { 1 } } } { 1 + e ^ { 2 x _ { 2 } } } } = \sqrt { \frac { 1 + e ^ { 2 x _ { 1 } } } { 1 + e ^ { - 2 x _ { 1 } } } } = e ^ { x _ { 1 } } \in ( 0,1 )$ . Thus the answer is: $( 0,1 )$ [Key Point Explanation] The key to solving this problem is to use the geometric meaning of the derivative to transform the condition $x _ { 1 } + x _ { 2 } = 0$ , and after eliminating one variable, the calculation yields the solution.

IV. Answer Questions: This section contains 6 questions totaling 70 points. Solutions should include written explanations, proofs, or calculation steps.

gaokao 2022 Q9 5 marks

Which of the following statistics can measure the dispersion of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$?
A. The standard deviation of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
B. The median of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
C. The range of the sample $x _ { 1 } , x _ { 2 } , \cdots , x _ { n }$
D. (option D not fully provided in source)

gaokao 2025 Q1 5 marks

The mean of the sample data 2, 8, 14, 16, 20 is ( )
A. 8
B. 9
C. 12
D. 18

germany-abitur 2020 QB 3c 2 marks

The random variable $Z$, which for a fair coin describes the number of occurrences of ``heads'' in four rolls, also has the expected value 2 and analogously $P ( Z = 2 ) = \frac { 3 } { 8 }$. Calculate the variance of $Z$, compare it with the variance of $Y$, and based on this describe a qualitative difference in the probability distributions of $Z$ and $Y$.

germany-abitur 2021 QB 2c 5 marks

Using the table of values, determine the smallest interval symmetric about the expected value in which the values of the random variable $X$ lie with a probability of at least $75 \%$.

germany-abitur 2023 QB 1c 2 marks

The random variable $X$ describes the number of cars with electric motors among the selected vehicles. Calculate the expected value and the standard deviation of $X$.

germany-abitur 2023 QB 1d 3 marks

For a certain value $n \in \{ 1 ; 2 ; 3 ; \ldots \}$, the binomially distributed random variables $Z _ { p }$ with parameters $n$ and $p$ are considered for $p \in ] 0 ; 1 [$. Show that among these random variables, the one with $p = 0.5$ has the greatest variance.

germany-abitur 2025 Qa 5 marks

To assess whether a machine works well, the mean of the filling quantity and the spread of the filling quantity are considered. A machine works better the closer the filling is on average to the value 330 ml and the smaller the spread is.
For the sample from Machine $A$, the mean of the filling quantity is 330 ml and the standard deviation is approximately $1.34 \mathrm { ml }$.
Investigate based on the samples which of the two machines works better.

grandes-ecoles 2016 Q2 Integral or Series Representation of Moments

Let $Y$ be a random variable taking values in $\mathbb{N}$ almost surely, and which admits an expectation. Show that $$\mathbb{E}(Y) = \sum_{k=1}^{+\infty} \mathbb{P}(Y \geqslant k)$$

grandes-ecoles 2017 QI.B.1

Suppose that $X$ admits a second moment. Let $\delta$ be an element of $\mathbb{R}^{+*}$. Show that, for $n$ in $\mathbb{N}^{*}$, $$P\left(\left|S_{n} - nE(X)\right| \geqslant n\delta\right) \leqslant \frac{V(X)}{n\delta^{2}}$$

grandes-ecoles 2017 Q1 Direct Proof of an Inequality

Let $m$ be a measure. Let $f : \mathbb { R } \rightarrow \mathbb { R }$ be a function that admits a variance relative to $m$. Show that $fm$ is integrable. As a consequence, the real $$\operatorname { Var } _ { m } ( f ) = \int f ( x ) ^ { 2 } m ( x ) d x - \left( \int f ( x ) m ( x ) d x \right) ^ { 2 }$$ is well defined. Show that $\operatorname { Var } _ { m } ( f ) \geqslant 0$.

grandes-ecoles 2017 Q2 Direct Proof of an Inequality

Let $m$ be a measure. Let $f : \mathbb { R } \rightarrow \mathbb { R }$ be a function that admits an entropy relative to $m$. We consider the function $h : [ 0 , + \infty [ \rightarrow \mathbb { R }$ defined by $h ( 0 ) = 0$ and for $x > 0$, $h ( x ) = x \ln ( x )$.
2a. Show that $f ^ { 2 } m$ is integrable. As a consequence, the real $$\operatorname { Ent } _ { m } ( f ) = \int h \left( f ( x ) ^ { 2 } \right) m ( x ) d x - h \left( \int f ( x ) ^ { 2 } m ( x ) d x \right)$$ is well defined.
2b. Let $a > 0$. Show that $$\forall x \geqslant 0 , \quad h ( x ) \geqslant ( x - a ) h ^ { \prime } ( a ) + h ( a ) ,$$ with strict inequality if $x \neq a$.
2c. Show that $\operatorname { Ent } _ { m } ( f ) \geqslant 0$. You may use the previous question with $a = \int f ( x ) ^ { 2 } m ( x ) d x$.
2d. We assume here that for all $x \in \mathbb { R } , m ( x ) > 0$. Characterize the functions $f$ such that $\operatorname { Ent } _ { m } ( f ) = 0$.

grandes-ecoles 2018 Q38 Expectation and Variance from Context-Based Random Variables

Let $\left(X_{n}\right)_{n \in \mathbb{N}}$ be a sequence of mutually independent Rademacher random variables and $S_{n} = \sum_{j=1}^{n} X_{j}$. For $x$ real, $\lfloor x \rfloor$ denotes the integer part of $x$. For all real numbers $\delta > 0$ and $\tau > 0$, calculate $\mathbb{V}\left(\delta S_{\lfloor 1/\tau \rfloor}\right)$, the variance of the random variable $\delta S_{\lfloor 1/\tau \rfloor}$.

grandes-ecoles 2018 Q39 Limit and Convergence of Probabilistic Quantities

Let $\left(X_{n}\right)_{n \in \mathbb{N}}$ be a sequence of mutually independent Rademacher random variables and $S_{n} = \sum_{j=1}^{n} X_{j}$. Show that, for every real number $\delta$, $\mathbb{V}\left(\delta S_{\lfloor 1/\tau \rfloor}\right)$ is equivalent to $\frac{\delta^{2}}{\tau}$, as $\tau$ tends to 0 from above.

Measures of Location and Spread

All Questions

9. CD

IV. Answer Questions: This section contains 6 questions totaling 70 points. Solutions should include written explanations, proofs, or calculation steps.