Scatter diagrams
A scatter diagram is a chart that plots pairs of values for two variables to explore their relationship. It helps reveal correlation, patterns, clusters, and outliers to support analysis and decision-making.
Key Points
- Displays the relationship between two variables using plotted data pairs.
- Helps assess correlation (positive, negative, or none) and identify outliers.
- Useful for exploring potential cause-and-effect and validating hypotheses.
- Often used in quality management, problem solving, and process improvement.
- May include a fitted trendline or correlation coefficient for clarity.
- Correlation does not prove causation; consider confounding factors.
What the Diagram Shows
- The general direction of the relationship: upward (positive), downward (negative), or no clear trend.
- The strength of association: tight cluster (strong) versus widely scattered points (weak).
- Nonlinear patterns such as curves, thresholds, or plateaus.
- Distinct clusters that may indicate subgroups or different conditions.
- Outliers that may signal data errors, special causes, or new insights.
How to Construct
- Define the purpose and choose two variables to compare.
- Collect paired data from the same observations and time frame.
- Set consistent scales and units for the x-axis (independent) and y-axis (dependent).
- Plot each data pair as a point; avoid connecting lines between points.
- Optionally add a trendline (linear or appropriate curve) and show R or R².
- Label axes, units, time period, and note any filters or subgrouping.
- Review for outliers and data quality issues; refine if needed.
Inputs Needed
- Operational definitions for both variables and expected direction of influence.
- Paired measurements from a reliable data source over a defined period.
- Units, scaling choices, and any subgroup identifiers.
- Data quality checks and a sampling plan or collection method.
- Tooling to plot points and, if needed, compute trendlines and correlation.
Outputs Produced
- A plotted chart that visualizes the relationship between two variables.
- Observed pattern: positive, negative, nonlinear, or no apparent relationship.
- Identified outliers or clusters for follow-up analysis.
- Optional trendline, regression equation, and correlation metric (e.g., R or R²).
- Insights and hypotheses to guide corrective actions or further testing.
Interpretation Tips
- Look for overall shape first, then assess strength and direction of correlation.
- Test for nonlinearity; a straight line may not fit curved patterns.
- Beware of confounders and overlapping subgroups that can mask the true relationship.
- Do not infer causation solely from correlation; validate with experiments or additional evidence.
- Check sample size and data range; restricted ranges can hide correlations.
- Investigate outliers for special causes or data errors before drawing conclusions.
Example
- A team suspects that more peer review time reduces defects. They plot review hours (x-axis) against defects found in testing (y-axis) for 25 work items. The points trend downward with a moderate fit, suggesting that increased review time is associated with fewer defects.
Pitfalls
- Using cumulative data, which can create artificial trends.
- Mixing unmatched pairs or inconsistent time frames.
- Ignoring subgroup effects that require separate plots or color-coding.
- Overreliance on R or R² without visually checking the plot for nonlinearity or outliers.
- Poor axis scaling that exaggerates or hides the pattern.
- Assuming causation and implementing changes without controlled testing.
PMP Example Question
A project team wants to verify whether an increase in training hours is related to improved first-pass quality. Which tool should they use?
- Histogram.
- Control chart.
- Scatter diagram.
- Checklist.
Correct Answer: C — Scatter diagram.
Explanation: A scatter diagram plots paired values for two variables to assess their relationship. It is the appropriate tool to explore correlation between training hours and quality outcomes.
HKSM