Data Displays and Relative Frequencies

A school administrator hands you a spreadsheet containing the test scores, lunch preferences, and attendance records of four hundred seventh graders. Staring at the raw rows and columns reveals almost nothing. Human cognition is not built to parse infinite grids of symbols; we are visual creatures. To understand the story hidden within a dataset, we must translate numbers and labels into physical dimensions—lengths, areas, and angles.

A spreadsheet organizes raw data into a grid of rows and columns, which must then be visually translated for humans to easily interpret underlying patterns.
A spreadsheet organizes raw data into a grid of rows and columns, which must then be visually translated for humans to easily interpret underlying patterns.

The fundamental divide in this translation lies in the nature of the data itself. Before you can draw a single axis, you must ask what kind of information you possess. Categorical data represents characteristics or groups—such as a student's favorite extracurricular activity or their primary mode of transportation to school. In contrast, numerical data represents measurable or countable quantities—such as the exact weight of a student's backpack or their score on a midterm exam. Recognizing this dichotomy is the key to selecting the correct statistical tool. You cannot calculate the "average" favorite color, just as you would not chart fifty distinct test scores as independent, unrelated categories.