UFM Statistics

View all 12 questions →

gaokao 2015 Q4 View
4. To understand the relationship between annual family income and annual expenditure in a certain community, 5 families were randomly surveyed, and the following statistical data table was obtained:
Income $x$ (ten thousand yuan)8.28.610.011.311.9
\begin{tabular}{ c } Expenditure $y$ (ten
thousand yuan)
& 6.2 & 7.5 & 8.0 & 8.5 & 9.8 \hline \end{tabular}
Based on the table, the regression line equation $\hat { y } = \hat { b } x + \hat { a }$ can be obtained, where $\hat { b } = 0.76$ and $\hat { a } = \bar { y } - \hat { b } \bar { x }$. Based on this, the estimated annual expenditure for a family in this community with an income of 15 ten thousand yuan is
A. 11.4 ten thousand yuan
B. 11.8 ten thousand yuan
C. 12.0 ten thousand yuan
D. 12.2 ten thousand yuan
gaokao 2015 Q4 View
4. Given that variables x and y satisfy the relationship $y = - 0.1 x + 1$, and variable y is positively correlated with z, the correct conclusion is
A. x and y are positively correlated, x and z are negatively correlated
B. x and y are positively correlated, x and z are positively correlated
C. x and y are negatively correlated, x and z are negatively correlated
D. x and y are negatively correlated, x and z are positively correlated
gaokao 2015 Q17 13 marks View
With the development of China's economy, residents' savings deposits have increased year by year. The table below shows the year-end balance of urban and rural residents' RMB savings deposits in a certain region:
Year20102011201220132014
Time code $t$12345
\begin{tabular}{ l } Savings deposits $y$
(hundred billion yuan)
& 5 & 6 & 7 & 8 & 10 \hline \end{tabular}
(I) Find the regression equation $\hat { y } = \hat { b } t + \hat { a }$ for $y$ with respect to $t$.
(II) Use the regression equation to predict the RMB savings deposits in this region for 2015 ($t = 6$). Note: In the regression equation $\hat { y } = \hat { b } t + \hat { a }$,
$$\hat { b } = \frac { \sum _ { i = 1 } ^ { n } t _ { i } y _ { i } - n \overline { t } \overline { y } } { \sum _ { i = 1 } ^ { n } t _ { i } ^ { 2 } - n \bar { t } ^ { 2 } } , \hat { \mathrm { a } } = \overline { \mathrm { y } } - \hat { \mathrm { b } } \overline { \mathrm { t } }$$
gaokao 2018 Q18 12 marks View
The figure below is a line graph of environmental infrastructure investment $y$ (in units of 100 million yuan) from 2000 to 2016 in a certain region.
To predict the environmental infrastructure investment in 2018 for this region, two linear regression models for $y$ and time variable $t$ were established. Based on data from 2000 to 2016 (time variable $t$ takes values $1,2 , \cdots , 17$ respectively), Model (1) was established: $\hat { y } = - 30.4 + 13.5 t$. Based on data from 2010 to 2016 (time variable $t$ takes values $1,2 , \cdots , 7$ respectively), Model (2) was established: $\hat { y } = 99 + 17.5 t$.
(1) Using each of these two models, find the predicted value of environmental infrastructure investment for 2018 in this region;
(2) Which model's prediction do you think is more reliable? Explain your reasoning.
gaokao 2018 Q18 12 marks View
(12 points)
The figure below is a line graph of environmental infrastructure investment $y$ (in units of 100 million yuan) from 2000 to 2016 in a certain region.
To forecast the environmental infrastructure investment for 2018 in this region, two linear regression models were established for $y$ and time variable $t$. Based on data from 2000 to 2016 (time variable $t$ takes values $1, 2, \ldots, 17$ respectively), Model (1) is established: $\hat { y } = - 30.4 + 13.5 t$; based on data from 2010 to 2016 (time variable $t$ takes values $1, 2, \ldots, 7$ respectively), Model (2) is established: $\hat { y } = 99 + 17.5 t$.
(1) Using each of these two models respectively, predict the environmental infrastructure investment for 2018 in this region;
(2) Which model do you think gives a more reliable prediction? Explain your reasoning.
gaokao 2020 Q5 5 marks View
A study group at a school conducted seed germination experiments at 20 different temperature conditions to investigate the relationship between the germination rate $y$ of a certain crop seed and temperature $x$ (in ${}^{\circ}\mathrm{C}$). From the experimental data $\left( x _ { i } , y _ { i } \right) ( i = 1,2 , \cdots , 20 )$, a scatter plot was obtained. Based on this scatter plot, between $10^{\circ}\mathrm{C}$ and $40^{\circ}\mathrm{C}$, which of the following four regression equation types is most suitable as the regression equation type for the germination rate $y$ and temperature $x$?
A. $y = a + b x$
B. $y = a + b x ^ { 2 }$
C. $y = a + b \mathrm { e } ^ { x }$
D. $y = a + b \ln x$
gaokao 2020 Q5 5 marks View
A student research group at a school conducted an experiment to study the relationship between the germination rate $y$ of a certain crop seed and temperature $x$ (in units of ${}^{\circ}\mathrm{C}$). Under 20 different temperature conditions, seed germination experiments were performed. From the experimental data $\left( x _ { i } , y _ { i } \right)$ ($i = 1,2 , \cdots , 20$), a scatter plot was obtained. From this scatter plot, between $10^{\circ}\mathrm{C}$ and $40^{\circ}\mathrm{C}$, which of the following four regression equation types is most suitable as the regression equation type for the germination rate $y$ and temperature $x$?
A. $y = a + b x$
B. $y = a + b x ^ { 2 }$
C. $y = a + b e ^ { x }$
D. $y = a + b \ln x$
gaokao 2020 Q18 12 marks View
After treatment, the ecosystem in a desert region has improved significantly, and the number of wild animals has increased. To investigate the population of a certain wild animal species in this region, the area is divided into 200 plots of similar size. A simple random sample of 20 plots is selected as sample areas. The sample data obtained is $\left( x _ { i } , y _ { i } \right) ( i = 1,2 , \cdots , 20 )$ , where $x _ { i }$ and $y _ { i }$ represent the plant coverage area (in hectares) and the number of this wild animal species in the $i$-th sample area, respectively. The following calculations are obtained: $\sum _ { i = 1 } ^ { 20 } x _ { i } = 60 , \sum _ { i = 1 } ^ { 20 } y _ { i } = 1200 , \sum _ { i = 1 } ^ { 20 } \left( x _ { i } - \bar { x } \right) ^ { 2 } = 80 , \sum _ { i = 1 } ^ { 20 } \left( y _ { i } - \bar { y } \right) ^ { 2 } = 9000 , \sum _ { i = 1 } ^ { 20 } \left( x _ { i } - \bar { x } \right) \left( y _ { i } - \bar { y } \right) = 800$ .
(1) Find the estimated value of the population of this wild animal species in the region (the estimated value equals the average number of this wild animal species in the sample areas multiplied by the number of plots);
(2) Find the correlation coefficient of the sample $\left( x _ { i } , y _ { i } \right) ( i = 1,2 , \cdots , 20 )$ (accurate to 0.01);
(3) Based on current statistical data, there is great variation in plant coverage area among different plots. To improve the representativeness of the sample and obtain a more accurate estimate of the population of this wild animal species in the region, please suggest a more reasonable sampling method and explain your reasoning.
gaokao 2022 Q19 12 marks View
After years of environmental remediation, a certain region has transformed barren mountains into green mountains and clear waters. To estimate the total timber volume of a certain tree species in a forest area, 10 trees of this species were randomly selected. The cross-sectional area at the base (in $\mathrm { m } ^ { 2 }$ ) and timber volume (in $\mathrm { m } ^ { 3 }$ ) of each tree were measured, yielding the following data:
Sample number $i$12345678910Total
Base cross-sectional area $x _ { i }$0.040.060.040.080.080.050.050.070.070.060.6
Timber volume $y _ { i }$0.250.400.220.540.510.340.360.460.420.403.9

It is calculated that $\sum _ { i = 1 } ^ { 10 } x _ { i } ^ { 2 } = 0.038 , ~ \sum _ { i = 1 } ^ { 10 } y _ { i } ^ { 2 } = 1.6158 , \sum _ { i = 1 } ^ { 10 } x _ { i } y _ { i } = 0.2474$ .
(1) Estimate the average base cross-sectional area and average timber volume per tree of this species in the forest area;
(2) Find the sample correlation coefficient between the base cross-sectional area and timber volume of this tree species (accurate to 0.01);
(3) The base cross-sectional area of all trees of this species in the forest area was measured, and the total base cross-sectional area of all such trees is $186 \mathrm {~m} ^ { 2 }$ . Given that the timber volume of a tree is approximately proportional to its base cross-sectional area, use the above data to estimate the total timber volume of this tree species in the forest area.
Note: Correlation coefficient $r = \frac { \sum _ { i = 1 } ^ { n } \left( x _ { i } - \bar { x } \right) \left( y _ { i } - \bar { y } \right) } { \sqrt { \sum _ { i = 1 } ^ { n } \left( x _ { i } - \bar { x } \right) ^ { 2 } \sum _ { i = 1 } ^ { n } \left( y _ { i } - \bar { y } \right) ^ { 2 } } } , \sqrt { 1.896 } \approx 1.377$ .
taiwan-gsat 2024 Q9 5 marks View
A laboratory collected a large number of two similar species $A$ and $B$, recording their body length $x$ (in centimeters) and body weight $y$ (in grams). The average body lengths of species $A$ and $B$ are $\overline{x_{A}} = 5.2$ and $\overline{x_{B}} = 6$ respectively, with standard deviations 0.3 and 0.1 respectively. Let the average body weights of species $A$ and $B$ be $\overline{y_{A}}$ and $\overline{y_{B}}$ respectively. If the regression lines of body weight $y$ on body length $x$ for species $A$ and $B$ are $L_{A}: y = 2x - 0.6$ and $L_{B}: y = 1.5x + 0.4$ respectively, with correlation coefficients 0.6 and 0.3 respectively. An individual $P$ with body length 5.6 centimeters and body weight 8.6 grams is discovered. Select the correct options.
(1) $\overline{y_{A}} < \overline{y_{B}}$
(2) The standard deviation of body weight for species $A$ is less than that for species $B$
(3) For species $A$, the absolute difference between individual $P$'s body weight and the average body weight $\overline{y_{A}}$ is greater than one standard deviation
(4) The distance from point $(5.6, 8.6)$ to line $L_{A}$ is less than its distance to line $L_{B}$
(5) The distance from point $(5.6, 8.6)$ to point $(\overline{x_{A}}, \overline{y_{A}})$ is less than its distance to point $(\overline{x_{B}}, \overline{y_{B}})$
taiwan-gsat 2025 Q5 5 marks View
A company collected data on the number of customers $x$ (in units of 100 people) and sales revenue $y$ (in units of 10,000 yuan) from 8 branches last week, obtaining 8 data points $(x, y)$ as follows: $(3,3), (3,5), (3,2), (4,4), (5,8), (6,7), (8,12), (8,7)$. These 8 points are plotted on the coordinate plane, and the regression line equation for $y$ with respect to $x$ is determined to be $y = \frac { 5 } { 4 } x - \frac { 1 } { 4 }$.
The company wants to analyze from another perspective. The 8 data points are sorted separately from smallest to largest for the number of customers and sales revenue, resulting in new 8 data points $(x, y)$ as follows: $(3,2), (3,3), (3,4), (4,5), (5,7), (6,7), (8,8), (8,12)$. Let the regression line equation for $y$ with respect to $x$ for the new 8 data points be $y = m x + b$, where $m, b$ are real numbers. Based on the above, select the correct option.
(1) $m = \frac { 5 } { 4 }$ and $b = - \frac { 1 } { 4 }$
(2) $m > \frac { 5 } { 4 }$ and $b > - \frac { 1 } { 4 }$
(3) $m > \frac { 5 } { 4 }$ and $b < - \frac { 1 } { 4 }$
(4) $m < \frac { 5 } { 4 }$ and $b > - \frac { 1 } { 4 }$
(5) $m < \frac { 5 } { 4 }$ and $b < - \frac { 1 } { 4 }$
taiwan-gsat 2025 Q12 5 marks View
A certain alloy is composed of two metals, A and B. A student wants to know the relationship between the metal ratio and the wavelength of the alloy. He conducted an experiment measuring ``the wavelength $y$ (in nanometers) of an alloy with A comprising $x\%$'' and plotted 20 data points $(x_k, y_k)$, $k = 1, \cdots, 20$, on the $xy$ plane. The regression line (best-fit line) is $y = 21.3 x - 40$.
To comply with submission standards, the report must be described as ``the wavelength $v$ (in micrometers) of an alloy with B comprising $u\%$''. He converted the data $(x_k, y_k)$ to $(u_k, v_k)$, $k = 1, \cdots, 20$, and obtained the regression line on the $uv$ plane as $v = a u + b$. Given that 1 nanometer $= 10 ^ { - 9 }$ meter and 1 micrometer $= 10 ^ { - 6 }$ meter. Select the correct options.
(1) $u _ { k } = 100 - x _ { k } , k = 1 , \cdots , 20$
(2) $v _ { k } = 1000 y _ { k } , k = 1 , \cdots , 20$
(3) The standard deviation of $u _ { 1 } , u _ { 2 } , u _ { 3 } , \ldots , u _ { 20 }$ equals the standard deviation of $x _ { 1 } , x _ { 2 } , x _ { 3 } , \ldots , x _ { 20 }$
(4) $b = 2.09$
(5) The student found another data point $(u _ { 21 } , v _ { 21 })$ satisfying $v _ { 21 } = a u _ { 21 } + b$; if these 21 data points $(u_k, v_k)$, $k = 1, \cdots, 21$, are plotted on the $uv$ plane, the regression line is still $v = a u + b$