Preface and basic terms#

Discrete variable#

A discrete variable is a type of variable that can only take on a finite number of distinct values, or countable values within a specified range. These values are typically integers, and there are no intermediate values between them. Discrete variables often represent characteristics that can be counted, such as the number of children in a family, the number of cars in a parking lot, or the outcome of rolling a die.

Continuous variable#

A continuous variable, on the other hand, is a type of variable that can take on any value within a certain range or interval. Unlike discrete variables, continuous variables can take on an infinite number of possible values, and there are theoretically infinite possibilities between any two values. Continuous variables are often used to represent measurements such as height, weight, time, temperature, or any other quantity that can be measured on a continuous scale.

Conducting a survey#

Example. Suppose you are conducting a study on the shoe size and heights of students in the UAS. You collect data from a sample of 20 students. The shoe sizes of the students are recorded as discrete variables, while their heights are recorded as continuous variables.

Discrete variable - shoe sizes

For the shoe sizes of the students, we can create a frequency table showing the number of students in each shoe size group. If the number some shoe sizes is small, you can also categorize the shoe numbers into groups such as 3-4½, 5-6½, 6-7½, etc., depending on the range and distribution of show numbers in your sample.

Shoesize-adult-es.svgCC BY-SA 3.0, Link

Continuous variable - heights

For the heights of the students, create a histogram to visualize the distribution of heights. Determine appropriate bins (intervals) for the histogram, such as 140-150 cm, 151-160 cm, 161-170 cm, etc., based on the range of heights in your sample.

Analysis

Compare and contrast the distributions of shoe sizes and heights. Discuss any patterns or trends you observe in the data. Consider questions such as

  • Are there any predominant shoe size groups among the students?

  • What is the typical height range for students in the sample?

  • Is there any relationship between shoe size and height in the sample?

List of fundamental terms#

  1. Population or the Univeral set: The entire group that is the subject of a statistical study or analysis.

  2. Sample: A subset of the population selected for analysis.

  3. Parameter: A numerical characteristic of a population, often denoted by Greek letters (e.g., \(\mu\) for population mean, \(\sigma\) for population standard deviation).

  4. Statistic: A numerical characteristic of a sample, used to estimate or infer information about a population parameter.

  5. Random Variable: A variable that can take on different values as a result of random phenomena. It can be either discrete or continuous.

  6. Probability Distribution: A function that describes the likelihood of different outcomes of a random variable.

  7. Probability Mass Function (PMF): For discrete random variables, the PMF gives the probability of each possible value.

  8. Probability Density Function (PDF): For continuous random variables, the PDF gives the relative likelihood of different values (the probability of a value falling within a particular interval).

  9. Expected Value (Mean): The average value of a random variable, calculated as the weighted sum of all possible values.

  10. Variance: A measure of the spread or dispersion of a random variable’s values around its mean.

  11. Standard Deviation: The square root of the variance, providing a measure of the average deviation of values from the mean.

  12. Sampling Distribution: The distribution of a statistic (e.g., sample mean or sample proportion) calculated from different samples of the same size taken from the same population.

  13. Hypothesis Testing: A statistical method used to make inferences about a population parameter based on sample data.

  14. Confidence Interval: A range of values constructed around a sample statistic, within which the true population parameter is likely to lie with a certain level of confidence.

  15. Regression Analysis: A statistical technique used to explore the relationship between one or more independent variables and a dependent variable.