Practice in Excel and SPSS

Need a little practice in Excel and SPSS?  On the Statistics Help section of MattCHoward.com, I recently uploaded several practice pages.  Links to each individual page are below.

Chi-Square Practice

Correlation Practice

Dummy-Coded Regression Practice

One-Way ANOVA Practice

Quadratic Regression Practice

Regression Practice

Two-Sample T-Test Practice

Two-Way ANOVA Practice

As always, please email me at MHoward@SouthAlabama.edu if you have any questions, comments, concerns, or otherwise!

ANOVA and Chi-Square in Excel and SPSS

More guides for statistical analyses in Excel and SPSS on MattCHoward.com!  This week, I added four guides.  The links can also be found below:

One-Factor ANOVA in Excel

Chi-Square in SPSS

One-Way ANOVA in SPSS

Two-Way ANOVA in SPSS

As always, feel free to email me at MHoward@SouthAlabama.edu if you have any questions, comments, stories, or requests for guides!

 

One-Sample and Two-Sample T-Test in Excel and SPSS

More guides for statistical analyses in Excel and SPSS on MattCHoward.com!  This week, I added three guides.  The links can also be found below:

One-Sample T-Test in SPSS

Two-Sample T-Test in SPSS

Two-Sample T-Test in Excel

As always, feel free to email me at MHoward@SouthAlabama.edu if you have any questions, comments, stories, or requests for guides!

Quadratic & Dummy-Coded Regression in Excel and SPSS

More guides for statistical analyses in Excel and SPSS on MattCHoward.com!  This week, I added four guides.  The links can also be found below:

Dummy-Coded Regression in Excel

Dummy-Coded Regression in SPSS

Quadratic Regression in Excel

Quadratic Regression in SPSS

As always, feel free to email me at MHoward@SouthAlabama.edu if you have any questions, comments, stories, or requests for guides!

Regression and More in Excel and SPSS

As mentioned last week, I still have my gears shifted for the rest of the academic semester.  Instead of Statistical Bullshit posts, I’m making guides for statistical analyses in Excel and SPSS on MattCHoward.com.  This week, I added three guides.  One on regression in ExcelOne on regression in SPSSAnd one on partial correlation in SPSS.  The links can also be found below:

Regression in Excel

Regression in SPSS

Partial Correlation in SPSS

As always, feel free to email me at MHoward@SouthAlabama.edu if you have any questions, comments, stories, or requests for guides!

Excel and SPSS Help!

It is becoming that time of the semester where my classes are more time-consuming than I expect.  For that reason, I am shifting gears (slightly), and I am going to focus more on developing statistical guides for Excel and SPSS on MattCHoward.com.  Fortunately, I already have a few guides made.  So, if you need help with calculating statistics in Excel and SPSS…

Click here for the Excel help page.

And click here for the SPSS help page.

As always, if you have any questions, comments, stories, or anything else, feel free to contact me a MHoward@SouthAlabama.edu.  I always have time to write about Statistical Bullshit stories, even if I am focused on creating statistical guides!

Bullshit Outliers

Once upon a time, I asked a colleague for their dataset featured in a published article.  The article produced a significant correlation between two variables, and I wanted to reproduce the findings using their dataset.  Lo and behold, I was able to successfully replicate the correlation!  However, when I further inspected their data, I noticed a particular concern.  It seemed that the significance of the relationship hinged on a single value that could be considered an outlier.  I struggled with whether I should bring this up to my colleague, but I finally did after much thought.  Their response educated me on outliers, but it also exposed me to a new type of Statistical Bullshit – and how reasonable statistical practices could be confused for Statistical Bullshit.

Today’s post first discusses Statistical Bullshit surrounding outliers.  Then, I summarize the discussion that I had with my colleague, and why things don’t always seem as they appear when it comes to outliers and Statistical Bullshit!


While many different types of outliers can be classified, today’s post discusses three.  The first uses the following dataset:  Click Here for Dataset .  In this dataset, we have job satisfaction and job performance recorded for 29 employees.  Each scale ranges from 0 to 100.  When we calculate a correlation between these two variables, we get a perfect and statistically significant relationship (r = 1.00, p < .01).  But let’s look at a scatterplot of this data.

Bullshit Outliers Figure 1

Hmm, clearly something is wrong!  This is because one employee had missing data for both variables, and the missing data was coded as 9999; however, when the analyses were performed, the 9999 was not properly removed and/or the program was not told that 9999 represented missing data.  When we run the analyses again with the outlier removed, the correlation is small and not statistically significant (r = .08, p > .05).

Bullshit Outliers Figure 2

I label this type of outlier as a researcher error outlier.  Numerically it is an outlier, but it does not represent actual data.  In all cases, this outlier should certainly be removed.

Next, let’s use the following dataset to discuss a second type of outlier: Click Here for Dataset.  Again, we have job satisfaction and job performance recorded for 29 employees.  Each scale ranges from 0 to 100.  When we calculate a correlation between these two variables, we get a very strong and statistically significant relationship (r = .76, p < .01).  But let’s look at a scatterplot of this data.

Bullshit Outliers Figure 3

Interesting.  We certainly have an outlier, but it is not clearly “wrong.”  Instead, it seems that most of the sample falls within the range of 0-30 for each variable, but one person had a value of 100 for both.  When we run the analyses again with the outlier removed, the correlation is small and not statistically significant (r = .01, p > .05).

Bullshit Outliers Figure 4

But, before being satisfied with our removal, we should strongly consider what this means for our data.  The occurrence of the one person certainly throws off our results, but this one person does indeed represent actual, meaningful data.  So, can we justify removing this person?  This question can be partially answered by determining whether we are interested in all employees or typical employees.  If we are interested in all employees, then the outlier should certainly stay in the dataset.  If we are interested in typical employees, then the outliers should possibly be removed.  No matter the decision, however, researchers and practitioners should report all of their analytical decisions, so that any readers could be aware of changes to the data before further analyses were conducted.

I label this type of outlier as an extreme outlier.

Lastly, let’s use the following dataset to discuss a third type of outlier:  Click Here for Dataset.  Again, we have job satisfaction and job performance recorded for 29 employees.  Each scale ranges from 0 to 100.  When we calculate a correlation between these two variables, we get a moderate and statistically significant relationship (r = .32, p < .01).  But let’s look at a scatterplot of this data.

Bullshit Outliers Figure 5

Now, we certainly have an outlier, but it is much closer to the other values.  While most of the sample falls within the range of 0-30 for each variable, but one person had a value of 43 for both.  When we run the analyses again with the outlier removed, the correlation is small and not statistically significant (r = .07, p > .05).

Bullshit Outliers Figure 6

Like the prior case, we need to strongly consider whether we should remove this participant.  It would be much more difficult to argue that this person represents unreasonable data, and it may even be difficult to argue that this person represents data that deviates from the typical population.  Yes, the person is a little extreme, but they are not drastically different.  For this reason, it is likely that we want to keep this observation within our sample.  I label this type of outlier as a reasonable outlier.

So, how is this relevant to Statistical Bullshit?  Well, for researcher error outliers, the entire significance of a relationship could be built on a single mistake.  Large decisions could be made based on nothing factual at all.  Similarly, for extreme outliers, our relationship could largely be driven by a single person, and our decisions could be overly influenced by this single person.  Lastly, for reasonable outliers and some extreme outliers, we could choose to remove these observations, which could result in a very different relationship that we could base our decisions.  However, our decisions would be based on only a portion of the sample, and we could be missing out on very important aspects of the population.  Thus, both not removing and removing outliers could result in Statistical Bullshit!

To bring this post full-circle, what happened when I chatted with my colleague?  As you probably guessed, the outlier was determined to be a reasonable outlier.  Certainly an outlier, but not enough to be confidently considered outside the typical population range.  After our conversation, I certainly saw their point, and felt that they made the correct decision with their analyses – and it helped me understand how to conduct my analyses in the future.

Well, that’s all for today!  If you have any questions, comments, or stories, please email me at MHoward@SouthAlabama.edu.  Until next time, watch out for Statistical Bullshit!