At the end of a conversation with a client last week, I heard an interesting remark. The client shared that she’d always used graphs in her work, but only at the end of a project, when she was writing up her report or presentation, never at the beginning, as I’d been discussing. Going forward, she continued, she was going to start her projects with creating a few graphs of her data, to see any trends or patterns from the get-go.
That got me thinking: there must be other people who do the same thing, who use graphs and charts, but only once a project’s nearing its end. By doing this, though, you’re handicapping yourself. Being able to literally SEE your data is one of the most powerful tools you have in your toolbox.
Look At Your Data
No matter if you’re a seasoned analytics practitioner or only just started analyzing data, the first step is to “see” your data. Usually this consists of data cleaning/data wrangling and running your descriptive statistics. But an important part of this is also to graph out your data. Why? To paraphrase Yogi Berra: You can see a lot just by looking.
Graph Your Data
When is the last time you made a box and whiskers plot? Be honest: when’s the last time you even thought about one? My guess is, it’s been a while. Let’s refresh our memory. We’ll start with the box and work outwards from there.
- The Median: Start at the very middle of the box. See that horizontal line in the middle of the box? That’s your median.
- Rusty on how to find the median? It’s the middle number when all your numbers are lined up smallest to largest.
- The Top and Bottom Lines of the Box: These are your 25th and 75th percentiles. In other words, this is the middle 50% of all your values.
- The Whiskers: These are the lines extending out from the box, and they can be calculated in several different ways. The most common is for the distance of the whiskers to be 1.5 times the distance between the upper and lower quartiles; that is, 1.5 times the height of the box itself.
- The Outliers: See those dots that are out past the whiskers? Those are your extreme values, your outliers. (IMHO, those are the most interesting part of your plot! But that’s for another post….)
Look for Patterns In Your Data
Take a look at the example below, a plot of one variable in a fake dataset. The median is right around 7; the 25th and 75th percentiles are close by, at just over six and just under eight, telling us the middle half of the data are pretty clustered between six and eight. Interesting. Is that what we were expecting?
Take a look at the extent of the whiskers, too. Any surprises there? But now look at the outliers — there are a couple on the bottom end of the plot, but a lot more at the top, including one lone data point out above fourteen. That’s pretty far away from the rest of the data! What’s happening there?
“And now for something completely different…”
Now we have the plot below. It’s telling a different story. Look in particular at the outliers for this one. Again, there’s one that’s very far away from all the other data points — is it an error? If not, what’s the story behind it? Oftentimes
Connect With Me
Give this a try with some of your data. Excel makes it easy to do: simply select your data and “Insert” a Box and Whisker plot. What pops out? What questions does it leave you with? If you find yourself with questions you’re not sure how to answer, I’d be happy to take a look. You can sign up here for a free 30-minute consultation via Zoom. We can talk through your plot, go over your questions, and I can weigh in on questions you might not have thought of yet.
I look forward to connecting!