SPSS-04: Descriptive Statistics

Descriptive statistics in SPSS are used to summarize and describe the main features of a dataset, providing insights into the distribution, central tendency, and variability of the data. Descriptive statistics in SPSS offer benefits such as summarizing large datasets, understanding central tendency, assessing variability, and ensuring data quality (Bluman, 2023). They help identify clusters, standard…

November 7, 2024

Descriptive statistics in SPSS are used to summarize and describe the main features of a dataset, providing insights into the distribution, central tendency, and variability of the data. Descriptive statistics in SPSS offer benefits such as summarizing large datasets, understanding central tendency, assessing variability, and ensuring data quality (Bluman, 2023). They help identify clusters, standard deviations, and errors, making data-driven decisions and interpretations easier. They are essential for exploring data, informing research hypotheses, and guiding further statistical analysis. Mean measures of sample (n) and population (N), which denoted, as follow:
– Mean of Sample (X-bar): $$\overline X\;is\;the\;sample\;mean$$
– Mean of Population (mu): $$\mu\;is\;the\;population\;mean$$

Section I. Measures of Central Tendency

1. Mean of Sample:

The mean is the sum of the values, divided by the total number of values (Bluman, 2023).
$$\overline X=\frac{\sum_{i=1}^nX_i}n\:(equation\;1)$$
Where:
$$\overline X\;is\;mean\;value$$
$$n\;is\;the\;sample\;sizes\;or\;numbers\;of\;observation$$
$$X_iis\;an\;index\;of\;observation\;numbers\;or\;sample\;sizes\;(i.e.,\;X_1,X_2,\dots,X_{n)}$$
$${\textstyle\sum_{i=1}^n}X_i=(X_1+X_2+….+X_n)$$

Example 1.-
We assume that a farmer raises 200 chicken (N: Population) on his own family farm. He wants to forecast the income he will generate over the next two months. To come up with the income prediction, he randomly picks up 11 chicken (n: Sample), as shown in Table 1.

Table 1. Income Prediction

X_i	Chicken/Kgs	Mean	Mode	Median	Mid-Range
1	2.00	2.525	2.00	2.600	2.45
2	1.80
3	2.50
4	2.60
5	3.10
6	2.70
7	2.80
8	2.00
9	3.00
10	2.78
11	2.50
Total	25.78

$$\overline X=\frac{\sum_{i=1}^nX_i}n=\frac{(X_1+X_2+X_3+…+X_{11})}{11}=\frac{2.00+1.80+2.50+…+2.50}{11}=\frac{27.78}{11}=\mathbf2\boldsymbol.\mathbf{525}$$

Conclusion: His chicken weighs an average of 2.525 kg.

2. Mean of Population:
$$\mu=\frac{\sum_{i=1}^NX_i}N\:(equation\;2)$$
where:
X_i = individual value
μ = population mean
N = population size
An example of this formula, please refer to Example 4.

2. Mode:

Mode is the value of the observation that appears most frequently (Bowerman et al., 2019).
When you look at the data from Table 1, we have arranged the data from the lowest value to the highest value. Then, we have data arrayed:

Short Data: 1.80 2.00 2.00 2.50 2.50 2.60 2.70 2.78 2.80 3.00 3.10
Therefore, the number 2.00 and 2.50 has occurred twice, representing the mean of the frequently occurring numbers. This implies that the average weight of his chicken, calculated using the mode technique, is 2.00 kg and 2.50 kg. Most importantly, we can refer to a bimodal 2.00 and 2.50 kg.

Note:
Mode could consist of the following:
– No mode
– One mode = Unimodal
– Two modes = Bimodal
– Three modes = Multimodal

3. Median

The median is the midpoint of the data arrays. The symbol for the median is MD (Bluman, 2023). The following steps will show you how to find out the median value more simply and easily (refer to Table 1).

3.1. Odd Number

Step 1. Short Data: Low to high value.
Short Data: 1.80 2.00 2.00 2.50 2.50 2.60^[6th] 2.70 2.78 2.80 3.00 3.10

Step 2. Find the Median Position (MP):

$$MP=\frac{n+1}2\:(equation\;3)$$
Where: n is the number of observation or sample sizes. Then,
$$MP=\frac{n+1}2=\frac{11+1}2=6^{th}$$

Step 3. Select the Middle data value.
Given the odd number of observations, the midpoint of the data value is 2.60 kg.
Tip: You can count from Left to Right or Right to Left: The position of the middle data value is 6^th .

3.2. Even Number:

Step 1, & 2 must be followed:
Step 3. Simply select an interval number and divide it by 2. Then, the media value will be there.

Example 2.- Tornadoes in the United States (Bluman, 2023)
The number of tornadoes that have occurred in the United States over an 8-year period is as follows. Find the median.
684, 764, 656, 702, 856, 1133, 1132, 1303

Step 1: Short Data:
Short data: 656, 684, 702, 764^[^4th^], 856^[5th], 1132, 1133, 1303

Step 2: Find the MD:
$$MP=\frac{n+1}2=\frac{8+1}2=4.5^{th}$$

Step 3. Find the middle value of data:

$$The\;Median\;=\;\frac{764^{\lbrack4th\rbrack}+856^{\lbrack5th\rbrack}}2=\;\mathbf{810}$$
Since this case involves an even number, the median is 810. This indicates that there have been 810 tornadoes in the states on average, as determined by the median method.

4. Range:

The range is the Highest (H) value minus the Lowest (L) value. The symbol R is used for the range (Bluman, 2023).
Refer to the Table 1, thus, the Range = H – L = 3.10 – 1.80 = 1.30 kg.

5. Mid-Range:

The range is the Highest (H) value plus the Lowest (L) value and divided by 2. The symbol MR is used for the mid-range (Bluman, 2023).
Refer to the Table 1, then…
$$MR\;=\;\frac{H+L}2=\frac{3.10\;+\;1.80}2=\;\mathbf2\boldsymbol.\mathbf{45}$$

Section II. Measures of Variation

1. Sample Variance:

Sample variance is a statistical measure that quantifies the degree of variation or dispersion in a set of values drawn from a larger population. It provides an estimate of how much individual data points differ from the sample mean.
The sample variance is calculated using the following formula:

$$s^2 = \frac{1}{n – 1} \sum_{i=1}^{n} (X_i – \bar{X})^2\::(equation\;4)$$

Where:

$$\bar{X} \;is\;the\;sample\;mean$$

$$s^2\;is\;the\;sample\;variance$$

$$n\;is\;number\;of\;observations\;in\;the\;sample$$

$$X_i\;is\;an\;each\;individual\;observation\;in\;the\;sample$$

Example 3.-Teacher Strikes (Bluman, 2023)

The number of public school teacher strikes in Pennsylvania for a random sample of school years is shown. Find the sample variance and the sample standard deviation. Table 2 shows the detail of data.

Table 2. Teacher Strikes

Schools	Teacher Strikes	$$(X_i-\overline X)$$	$${(X_i-\overline X)}^2$$
1	9	(9 – 8.5) = 0.5	(0.5)² = 0.25
=2	10	(10 – 8.5 = 1.5	(1.5)² = 2.25
3	14	(14 – 8.5) = 5.5	(5.5)² = 30.25
4	7	(7 – 8.5) = -1.5	(-1.5)² = 2.25
5	8	(8 – 8.5) = -0.5	(-0.5)² = 0.25
6	3	(3 – 8.5) = -5.5	(-5.5)² = 30.25
Total	51		65.50
	$${\textstyle\sum_{i=1}^n}X_i$$		$${\textstyle\sum_{i=1}^n}{(X_i-\overline X)}^2$$

$$s^2=\frac1{n-1}\sum_{i=1}^n(X_i-\overline X)^2=\frac{{(X1-\overline X)}^2+{(X2-\overline X)}^2+…+{(Xn-\overline X)}^2}{n-1}$$

$$s^2=\frac1{n-1}\sum_{i=1}^n(X_i-\overline X)^2=\frac{65.5}5=\;\mathbf{13}\boldsymbol.\mathbf1$$

2. Standard Deviation of Sample:
$$s=\sqrt{\frac{\sum_{i=1}^n{(X_i-\overline X)}^2}{n-1}}\:(equation\;5);or\;s=\sqrt{s^2}=\sqrt{13.1}\approx\mathbf3\boldsymbol.\mathbf6$$
Here the sample variance is 13.1, and the sample standard deviation is 3.6.

3. Population Variance:

The population variance is the average of the squares of the distance each value is from the mean. The symbol for the population variance is σ² (σ is the Greek lowercase letter sigma). The formula for the population variance is:
$$\boldsymbol\sigma^{\mathbf2}\boldsymbol=\frac{\overset{\mathbf N}{\underset{\mathbf i\boldsymbol=\mathbf1}{\boldsymbol\sum}}{\boldsymbol({\mathbf X}_{\mathbf i}\boldsymbol-\mathbf\mu\boldsymbol)}^{\mathbf2}}{\mathbf N}\:(equation\;6)$$
where:
X_i = individual value
μ = population mean
N = population size

4. Standard Deviation of Population:

The population standard deviation is the square root of the variance. The symbol for the population standard deviation is σ. The corresponding formula for the population standard deviation is:
$$\sigma=\sqrt{\frac{\sum_{\mathrm i=1}^{\mathrm N}{({\mathrm X}_{\mathrm i}-\mathrm\mu)}^2}{\mathrm N}}\;or\;\sigma=\sqrt{\mathrm\sigma^2}\:(equation\;7)$$

Example 4.-Comparison of Outdoor Paint (Bluman, 2023)
A testing lab wishes to test two experimental brands of outdoor paint to see how long each will last before fading. The testing lab makes 6 gallons of each paint to test. Since different chemical agents are added to each group and only six cans are involved, these two groups constitute two small populations. The results (in months) are shown. Find (1)-the mean of each group and (2)-Find the variance and standard deviation for the data set for brand A paint. More details of data, refer to the Table 3.

Table 3. Comparison of Outdoor Pain

Brand A	Band B
10	35
60	45
50	30
30	35
40	40
20	25
$$\mu_{\mathbf A}=\frac{210}6=\mathbf{35}\boldsymbol\;months$$	$$\mu_{\mathbf B}=\frac{210}6=\mathbf{35}\boldsymbol\;months$$

Solution:

1-Find the ranges for the paints:
For brand A, the range is
R = 60 − 10 = 50 months

For brand B, the range is
R = 45 − 25 = 20 months
Make sure the range is given as a single number. The range for brand A shows that 50 months separate the largest data value from the smallest data value. For brand B, 20 months separate the largest data value from the smallest data value, which is less than one-half of brand A’s range.

2-Find Population Variance and Standard Deviation of Population (for Brand A):

Step 1. Find the population mean:
$$\mu_{\mathbf A}=\frac{210}6=\mathbf{35}\boldsymbol\;months$$
Step 2. Subtract the mean from each data value (X_i − μ).
10 − 35 = −25
50 − 35 = 15
40 − 35 = 5
60 − 35 = 25
30 − 35 = −5
20 − 35 = −15

Step 3. Square each result ( X − μ)².
(−25)² = 625
(15)² = 225
(5)² = 25
(25)² = 625
(−5)² = 25
(−15)² = 225

Step 4. Find the sum of the squares Σ (X_i − μ)².
625 + 625 + 225 + 25 + 25 + 225 = 1750

Step 5. Divide the sum by N to get the variance:
$$\sigma^2=\frac{\sum_{i=1}^N{(X_i\;-\mu)}^2}N=\frac{1750}6\approx\;\mathbf{291}\boldsymbol.\mathbf7$$
Step 6. Find Standard Deviation of Population:
Take the square root of the variance to get the standard deviation. Hence, the standard deviation equals:
$$\sigma=\sqrt{\frac{\sum_{i=1}^N{(X_i\;-\mu)}^2}N}=\sqrt{\frac{1750}6}=\sqrt{\mathbf{291}\boldsymbol.\mathbf7}=\mathbf{17}\boldsymbol.\mathbf1$$

Part I. Step-by-Step Guide for Descriptive Statistics in SPSS

This guide provides a step-by-step guide for conducting descriptive statistics in SPSS, enabling effective data summarization and analysis, enhancing understanding of statistical measures and data interpretation.

Step 1: Open Your Data File

Launch SPSS on your computer.
Open your dataset by clicking on File > Open > Data and selecting the appropriate .sav file.

Step 2: Access the Descriptive Statistics Function

In the top menu, click on Analyze.
Hover over Descriptive Statistics.
Select Descriptives from the dropdown menu.

Step 3: Select Variables

In the Descriptives dialog box, you will see a list of variables on the left side.
Select the variable(s) you want to analyze and move them to the Variables box using the arrow button.

Step 4: Configure Options (Optional)

Click on the Options button if you want to customize the statistics that are displayed.
- Options include mean, standard deviation, minimum, maximum, sum, variance, range, standard error, and others.
- Check the boxes for the statistics you want to include in your output.
Click Continue after selecting your options.

Step 5: Run the Analysis

Once you have selected your variables and configured the options, click OK to run the descriptive statistics analysis.

Step 6: Review the Output

The output window will display a table of descriptive statistics for the selected variable(s).
- It includes measures such as the mean, standard deviation, minimum, maximum, and count (N).
Review the table to understand the central tendencies and variability in your data.

Step 7: Interpret the Results

Analyze the descriptive statistics to gain insights into your dataset.
- Mean: Indicates the average value.
- Standard Deviation: Measures the dispersion or spread of data points around the mean.
- Minimum/Maximum: Shows the range of values

Part II. Tutoring for Descriptive Statistics in SPSS

You can refer to the following SPSS tutoring for the descriptive statistics, as shown linked below:

References
1-Bluman, A. G. (2022). Elementary statistics: a step by step approach (7th ed.). McGraw-Hill.
2. Bowerman, B. L., Hummel, R. M., Drougas, A. M., Moninger, K. B., Duckworth, W. M., Schur, P. J., & Froelich, A. G. (2019). Business statistics and analytics in practice (9th ed.). McGraw-Hill Education.

Veasna

Professor., Dr. SOU Veasna earned Ph.D degree from department of Institute of International Business Management (IIBMA), National Cheng Kung University (NCKU), Taiwan in 2013. He is currently serving as a professor and senior researcher at the Royal University of Phnom Penh, Cambodia. He is teaching two expert courses of Statistics and Advanced Research Methodology. He has best practical at the following academic software, such as SPSS, AMOS, LISREL, HLM, Megastar, NVivo, and Minitab.