“Linear” does not mean “straight line”: Translating the stats language

When we think of a line, we think of something like this: ______ . It’s flat. Maybe it slants up, maybe it slants down, but it’s straight, continues in a predictable direction and has one number to describe its slope. In statistics, when we use the term “linear model,” we are not necessarily describing a straight line. Although statistical linear models could describe the classic straight line, most statistically linear models are not represented by straight lines but by curvilinear graphs. Both shapes in this picture are “linear”:

Screen Shot 2014-04-13 at 7.30.06 AM

Why does this merit a blog post? For the third time this month, people have expressed surprise when I have said or discussed with them that “linear” does not mean “straight line.”

What is a linear model?

We use the term linear in statistics to describe the parameters in the model we are using. Linear means that the response (y variable) is expected to be a linear combination of explanatory variables (either discrete or continuous). Linear refers to the explanatory variables being additive.

Why do we use linear models?

Ecologists use linear models because most of the time they are extremely useful for predicting actual response variables. We use them because they work! We can test whether this is true with our own data sets by looking at the residuals of our y-values. A residual describes how much an actual data point differs from its predicted data point given the model we are using. If we plot residuals we should see a random scatter plot that would fit a flat line. If we plot residuals and we see a U-shape or an inverted U-shape in the residual plots, a linear model may not be a good fit. The U-shapes mean that the residuals are not randomly distributed across the data set and indicates non-linearity.

How do we deal with non-linearity?

Most of the time we don’t. Non-linear models are more difficult to deal with. Instead we can do nonlinear transformations to turn non-linear relationships between variables into a linear relationship.

What transformations do we use?

There are five basic nonlinear transformations: exponential, quadratic, reciprocal, logarithmic and power. For a handy-dandy chart on how to perform each one, see the chart here. Applying a transformation to either the independent or dependent variable (sometimes both) changes the relationship between them. We use the transformation to increase linearity, and thus we need to re-check the residual plots after transformation to ensure that we get closer to the random scatterplot for residuals (we don’t want to exacerbate that U-shape). Plotting and checking is always a good test.

Key take-away: linearity can imply straight lines but we need to be careful where we look for those lines. Looking in the residuals we want a flat line with zero slope. Looking at our model fit, those curvilinear lines are a-okay.

Mathematical ecology club is moving through more complicated examples of how and when to transform data, and what R does internally when we specify linear models. If anyone is interested in learning more about how to use statistical models with your data correctly — and understand what you are doing with each R command, pick up a copy of the Marc Kery book and join us on Sunday mornings at Saints.

More stats posts to come!

About johannaohm

Johanna Ohm is a graduate student in Biology at Penn State University.
This entry was posted in News. Bookmark the permalink.

3 Responses to “Linear” does not mean “straight line”: Translating the stats language

  1. johannaohm says:

    As confusing as it sounds (and I agree that this is a problem in the language we use for statistics) linear can also mean curved line for statisticians.

    You can read more at these links:

    -“In statistical terms, any function that meets these criteria would be called a “linear function”. The term “linear” is used, even though the function may not be a straight line”
    “linear models are not limited to being straight lines or planes, but include a fairly wide range of shapes. For example, a simple quadratic curve”

    -If you like minitab: (note quadratic and cubic curves are still linear fits)

    Hope this helps! If you would like to read a book on the subject that goes into more detail of how statisticians think, check out the first few chapters of Marc Kery’s Introduction to WinBUGS for Ecologists

  2. ConfusedByBlogs says:

    Well, now I’m really confused. There seem to be many contradictory explanations of what linear means. Certainly the links you provide are consistent with the blog. Yet, there are others, such as this: http://people.duke.edu/~rnau/regintro.htm, which explicitly state that linear regression equates to straight line relationships (Kahn academy also). Are people talking about different things altogether or are some of these simply wrong?
    Based on this explanation (and the NIST one you provide): http://stats.stackexchange.com/questions/59782/linear-regression-explanations perhaps it really does boil down to additivity of the model parameters as you state. i.e. y = b0+b1x is linear, but y = b0+b0*b1x is not (both are actually straight lines). How additivity serves as a synonym for linearity is beyond me!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s