Introduction
Descriptive Statistics (Exploratory Data Analysis)
- Summary Statistics
- Used to understand and define the sample data
- Measures of Central Tendency
- Mean - average, sensitive to outliers
- Median - middle point of the data, robust to outliers
- Mode - most frequent value
- Measures of Dispersion
- Variance - average of the squared distance of each point to the mean
- Standard Deviation - how much our data is spread out around the mean, most commonly used
- Range - difference between largest and smallest observation in the data, sensitive to outliers
- Interquartile Range - difference between the 25th and 75th percentile of the data
- Don’t allow us to make conclusions beyond the data we’ve analyzed or reach conclusions regarding any hypothesis we might have made
Inferential Statistics
- Used to make generalizations about a population
- Key
- Make sure samples are representative
- Selecting a good sample is critical for making inferences about the population
- Sampling error