Bullshit Measurement

Are you measuring what you think you’re measuring? Could you be measuring something else entirely?

Accurate measurement of variables is essential for business success.  Sometimes, it’s fairly easy to record these variables – sales, revenue, profit.  Other times, it can be very very difficult.  For example, let’s say that you want to hire employees that are smart and conscientious.  How can we measure intelligence and conscientiousness?

Well, a good starting point is to develop a test or survey.  Many intelligence tests exist with varying levels of sophistication and accuracy, and you could pay to give these tests to applicants.  Many self-report surveys also exist that can measure conscientiousness, and you could pay to give these tests to applicants, too.  But what if you don’t want to use one of these existing measures?  What’s the worst that could happen?

In this post, we won’t talk about the worst that could happen, but we’ll discuss a pretty bad outcome: when your measure inadvertently gauges the wrong construct, which could result in a lawsuit.

I should also note that this example comes from an actual consulting experience that I encountered.  The names have been changed, but remember that these things actually happen in industry!


I was once hired along with a full team to review the new selection system of a trendy company.  Let’s call them X-Corp.  X-Corp wanted their selection to measure a construct that they invented: “the ideal X-Corp employee.”  They made a list of the ideal X-Corp employee characteristics.  It included the common constructs like intelligence and conscientiousness, but it also included some unorthodox constructs.  These included hip, stylish, savvy, sleek and so fourth.  X-Corp argued that the ideal employee needed to appeal to any potential customers, and therefore needed to have these characteristics; however, my team was already doubtful about the business relevance of theses constructs.

Even more concerning, X-Corp felt that their survey had to attract people to work for X-Corp.  For this reason, it couldn’t be a traditional survey.  It had to be different and exciting.  Once again, we were doubtful about how exciting a selection survey could be.

When we saw the survey to measure “the ideal X-Corp employee,” we began to worry even more.  The first question looked something like this:

Bullshit Measurement 1

What?

The text of the item read, “Using the scale, please indicate whether you are more like a sports car or a hybrid/electric car.”

…What?

Immediately, we asked X-Corp what this item was meant to measure.  Sure enough, they just said “the ideal X-Corp employee.”  We asked which subdimension, specifically, was the item meant to measure.  As they couldn’t respond, we realized that they didn’t really have an idea.  It seemed that they just put things in their survey that they thought would be a good idea without really thinking about the ramifications.

Do you think this item would help identify good employees?  Well, we first have to ask what is the “correct” answer.  According to X-Corp, the correct answer was being more like a hybrid/electric car.  So, anyone would indicated that they were more like a sports car got the item wrong.  Do you think this is fair?  More importantly, do you think those that feel more like a “hybrid/electric car” are necessarily better than those that feel more like a “sports car?”  I would guess that the answer is probably not.  There are probably many sports car people that are more intelligent, conscientious, hip, savvy, and so on when compared to hybrid/electric car people.  Thus, this item probably fails to measure “the ideal X-Corp employee.”

That item was bad, but it wasn’t the worst.  The worst was probably the following item:

Bullshit Measurement 2

Once again, what?

The text of the item read, “Using the scale, please indicate whether you are more or less like Kanye West.”

Once again…what?

X-Corp claimed that Kanye West was too narcissistic, and anyone who felt that they were like Kanye were not welcome at X-Corp.  Do you think that Kanye people are inherently worse than non-Kanye people?  Once again, I am guessing that the answer is probably not. Kanye people are probably just as good as non-Kanye people, and perhaps even better in some regards (i.e. creative, hip, etc.).  But can you think of anything else that this item might inadvertently measure?  Let’s look at the graph below, which is similar to the actual results.

Bullshit Measurement Graph

As some of you may have guessed, African Americans were much more likely to see themselves similar to Kanye than Caucasians.  This makes sense, as Kanye himself is African American.  Thus, this item partially measures the applicant’s ethnicity.

Remember when I said that those responding that they were more like Kanye were rated as worse applicants?  If this survey went live, that would mean that African Americans would automatically be penalized, thereby resulting in adverse impact.  This would almost assuredly result in a lawsuit, in which X-Corp could not justifiably defend – or, at least, have a very hard time defending that the Kanye question actually represented job performance.  This would have cost the company millions of dollars!

In the end, my team strongly recommended that the company should not use their selection survey, and should instead use a traditional survey.  The company wasn’t happy, and we were never asked to work with the company again.  But, they did guarantee that they would not use their selection system.  While it wasn’t the most satisfying result, I was happy that we were able to stop another case of Statistical Bullshit!

If you have any questions or comments about this story, feel free to contact me at MHoward@SouthAlabama.edu .  Also, feel free to contact me if you have any Statistical Bullshit stories of your own.  I’d love to include them on StatisticalBullshit.com!

Small Samples, Big Problems

Have you ever discussed statistical power or representative samples at work? Should you?

Often in business, we are restricted to relatively small samples.  In fact, a recent publication in the Journal of Organizational Behavior suggest that the most common type of business is a microbusiness – often defined as a business with less than 10 employees (Brawley & Pury, 2017).  As many readers already know, most all statistics require many more participants.  For instance, the most common recommendation for a correlation analysis is a minimum of 30 participants, and more advanced statistics most often require even more participants – often in the 100s.

But what is really the harm in having a small sample size?  Can the results really be that misleading?  The answer is yes.

This post discusses two concerns of small samples: power and representativeness.

Power is the likelihood of a statistical analysis to discover a significant result if a significant result actually exists in the population…But what does that mean?  Well, I’ll discuss this much more in-depth in a later post, but sample size is an important component to calculating statistical significance.  Even if an effect is extremely strong in the population, a statistical test using a small sample size will not identify that effect as statistically significant.  Weird, right?

Let’s use this example:  Imagine that we are studying pretty strong effect that has a population correlation of .40, such as the relationship between self-efficacy and job performance.  To study this relationship, let’s say that we use a microbusiness – one with eight employees – and we measure self-efficacy and job performance with each employee.  What is the likelihood that the resultant correlation between the two variables will be statistically significant, if we know the population correlation of the variables is .40?  Well, the likelihood that the result will be statistically significant is only 15%!  We would fail to reject the null more than every four out of every five times!

Crazy!  This example demonstrates one important reason to have a large sample size – you cannot identify significant results even if they should be significant.  To learn more about this phenomenon, I suggest reading more about statistical power (Cohen, 1992a, 1992b; Murphy et al., 2014) and playing with a sample size/power calculator (http://www.sample-size.net/correlation-sample-size/).

Next, let’s discuss having a representative sample.  Even if we have more employees, let’s say 150, there is a chance that our sample is not representative of the population.  If a sample is representative, it accurately reflects the members of the population.  Often, we assume that a randomly selected sample is representative, but this is not always the case.  Certain people may not volunteer to take your survey, and that may skew your results…But how bad can it be?

Well, let’s look at the self-efficacy and job performance example again with a correlation of .40.  If we had a representative sample of 300 people, the result might look something like this:

Example 1

Not too bad – the regression line shows a clear, increasing relationship.  Now, let’s take 150 of these people and graph the results again:

Example 2

Woah!  Big difference!  Now the correlation between the two is literally .00, and we only removed half of the participants.  What happened?

As you guessed, I did not take a random subset of the 300 people.  Instead, I selected only those that scored five or above on the self-efficacy measure, as you can see with the differing axis labels in the two charts.  This resulted in the sample being non-representative (because everyone with a self-efficacy score under five was missing), and thereby the result was greatly different than the entire set of 300 people.

But could this ever happen in business?  Yes!  Imagine that you are feeling down about your work performance and unable to do the most basic tasks.  Then, you see an email about a job survey to measure self-efficacy and performance.  Would you take it?  Maybe, but a lot of people would just delete the email in order to avoid facing their lackluster self-perceptions, abilities, and performance.

Also, who would typically take those surveys anyways?  The grumpy employees that just want to do their work and go home?  Or the goodie-goodies that do whatever their boss asks?  I’d guess the latter, and the samples may not be representative of all these employees.

And think about those satisfaction surveys at restaurants.  Yes, people that really hated the service or really loved the service will complete them…but what about all the people in the middle?  Have you ever completed a satisfaction survey when the service was just okay?  I’m guessing not, which resulted in the results being non-representative.

So, whenever you need to collect data, be sure to carefully consider your sample size – not only for statistical power, but also for representativeness.  If you ignore these two aspects, then you could obtain results that are entirely misleading, and thereby implement policies that do nothing for your company – or worse!

Until next time, watch out for Statistical Bullshit!  And email me at MHoward@SouthAlabama.edu if you have any questions, comments, or anything else!


References

Brawley, A. M., & Pury, C. L. (2017). Little things that count: A call for organizational research on microbusinesses. Journal of Organizational Behavior, 38, 917-920.

Cohen, J. (1992a). Statistical power analysis. Current directions in psychological science, 1(3), 98-101.

Cohen, J. (1992b). A power primer. Psychological bulletin, 112(1), 155.

Murphy, K. R., Myors, B., & Wolach, A. (2014). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests. Routledge.

Regression Toward the Mean

Can you make a career on Statistical Bullshit?

Regression Toward the Mean is one of the most common types of Statistical Bullshit in industry.  And, as the title quotation insinuates, some consultants have made an entire career swindling money from organizations through manipulating this statistical phenomenon.  If you are currently in practice, or ever plan to be, read on to discover whether you are currently being swindled out of thousands – or possibly millions!

Businesses are always on a time-series.  That is, most organizations are not worried about the profit that they turned today, but rather the profit that they will turn tomorrow.  For this reason, many types of statistics and methodologies applied in business are meant to analyze longitudinal trends in order to predict future results.

Let’s take perhaps the simplest time-series design: a single variable measured on multiple occasions.  In this example, let’s say that we are looking at overall company revenue in millions.

Month

Revenue

January

10.5

February

10

March

11

April

12.5

May

10.5

June

10

July

12

August

11

September

11

October

5

It seems that the average company revenue over the month was $11 million, but a severe drop occurred in October.  What would you do if your company revenue looked something like this?

Most anyone would say panic and take extreme measures – and that is what most companies do.  A company may replace the CEO, lay-off a large number of workers, or immediately implement a new corporate strategy.  Let’s say that a company does all three for our example, and the result looks like this:

Regression Toward the Mean without Text

Success!  The new CEO is a genius!  The lay-offs worked!  And the new corporate strategy is brilliant!  Right?  Well, maybe not.

The Regression Toward the Mean phenonmemon suggests that a time-series dataset will revert back to its average after an extreme value.  In other words, when an extreme high- or low-value occurs, it is much more difficult to get any more extreme than it is to revert back to the average.  So, in this instance, it is fully possible that the company’s actions successfully caused revenue to revert back to more normal values; however, it is perhaps just as likely that the revenue simply regressed back toward the mean naturally.  So, the new changes (and money spent!) may have actually done very little or even nothing at all…but you can always be sure that the new CEO will take credit for it.

Let’s discuss another common example of Regression Toward the Mean in business.  Imagine you are a floor manager at a factory, and your monthly number of dangerous incidents looks something like this.

Week

Incidents

January

5

February

4

March

8

April

6

May

6

June

4

July

2

August

8

September

5

October

20

Wow!  Quite the spike in incidents!  So, what do you do?  Of course, you’d request for your CEO to bring in a safety expert to reduce the number of dangerous incidents, and I can guarantee that the results will look something like this:

Regression Toward the Mean without Text - 2

Another success!  The safety expert saved lives!  You are brilliant!  As you guessed, however, this may not be the case.

Once again, a Regression Toward the Mean effect may have occurred, and the number of safety incidents naturally reverted back to an average level.  The money spent on the safety expert could have been used for other more fruitful purposes, but you can nevertheless take credit for saving your coworker’s lives.

Despite these two examples (and many many more that could be provided), not all instances of extreme values can be cured by waiting for the values to revert to more typical figures.  Sometimes, an effect is actually occurring, and an intervention is truly needed to fix a problem.  Without it, things could possibly get even worse.

So, what should you do when extreme values occur?  Perform an intervention?  Wait it out?  In academia, the answer is simple.  Most researchers have the luxury of collecting data from a control group that does not receive the intervention, and then comparing the data after a sufficient amount of time has passed.  If the intervention group resulted in better outcomes than the control group, then the intervention was indeed a success.  If the two groups have roughly equal outcomes, then the intervention had no effect.

Businesses do not have such luxuries.  Decisions need to be made quickly and correctly – or else someone could lose their job (or their life!).  For this reason, it is often common practice to go ahead and perform the intervention.  If the values return to normal, then you seem like a genius.  If they do not, then at least you tried.  On the other hand, if you do nothing and the values return to normal, then you seem like a genius again.  If they do not return to normal, however, then it seems like you ignored the severity of the issues.  The table below summarizes this issue:

Values Remain Extreme

Values Return to Normal

Do Nothing

You Ignored the Issue

You Succeeded!

Do Something

You Tried

You Succeeded!

Long story short, you should probably make an attempt to fix the issue, although it may simply be Statistical Bullshit in the end.

Before concluding, one last question should be answered about Regression Toward the mean: How exactly can people make a career on it?

Well, imagine that you are a safety consultant, and you receive several consulting offers at once.  You look at the companies, and they all seem to have a relatively stable number of incidents; however, you notice one that is going through a period of elevated incidents.   Now that you know about Regression Toward the Mean, you know that you should take this company’s offer.  Not only will they (probably) be willing to spend lots of money, but you (probably) need to do very little to reduce the incident rate.  Even if your safety suggestions are bogus, you can still appear to be a competent safety consultant.  Although it may sound crazy, I think you would be surprised how often this occurs in the real-world.

That is all for Regression Toward the Mean.  Do you have your own Regression Toward the Mean story?  Maybe a question?  Feel free to email me at MHoward@SouthAlabama.edu.  Until next time, watch out for Statistical Bullshit!