How does using a logarithm help achieve linearity in a dsitribution?Please explain, a diagram would be cool :D.

1 Answer

Top Answer

mlehuzzah's profile pic

mlehuzzah | Student, Graduate | (Level 1) Associate Educator

Posted on

I think this is what your question refers to:


When you have a bunch of data points, and you would like to model them with an equation, one way to do that is to use the "least squares" method to find a straight line that looks like it works. 

For example, from the wikipedia page "least squares" comes this picture:

The blue dots are data points, and the red line is a best fit line.


Some relationships aren't linear, however.




The graph is:

(EDIT: hopefully the graph is showing up for you?  My computer is only displaying an empty graph??)


Now, suppose we didn't know the equation, but we just had a whole bunch of data points that surrounded the curve `y=5e^(2x)` , and we wanted to figure out that the equation was `y=5e^(2x)`

If you did a least squares analysis on these hypothetical points, you would get a straight line, which wouldn't really match the data.


Try laking the logarithm:

`"log" y = "log" (5 e^(2x)) = "log" 5 + "log" (e^(2x)) = "log" 5 + 2x`

Let's call Y=log (y)

And log (5) is approximately 1.6

So, what you get is:


This is a straight line!


What that means is, if your collection of data points (dots on a graph) looks exponential, then you can take the logarithm of all the y-values, and do least squares on (x, log y) of all the points.  You wind up with a straight line (in this example, you will have figured out that your slope is 2 and your y-intercept is about 5).  You can convert this back to the exponential relationship by reversing the steps:

`"log" y = 2x+1.6`

`y = e^(2x+1.6) = e^(1.6) e^(2x) = 5e^(2x)`

So you can use a straight line to help you model exponential relationships, by using a logarithm


Or, suppose your data points were all near the curve

`y=(1.5) x^2.5`

But suppose we didn't have the actual equation `y=1.5 x^2.5`

We just had a bunch of dots that were near the curve on the graph.  How could we use logarithms to figure out the equation?

`"log" y = "log" 1.5 + "log" (x^2.5) = "log" 1.5 + 2.5 "log" x`

If we write Y = log y, X= log x, we have:

Y = .4 + 2.5 X

Again, this is a straight line.  What this means is, if we took looked at the logarithm of the x and y coordinates of all our data points, and plotted those instead, we would get data points that resembled a straight line, and we could do a least squares analysis of it.  Working backwards, we could get an equation that modeled the original data, even though the original data weren't in a straight line.



If your data seems to show an exponential relationship, or a power relationship, then you can use logarithms to transform your data into a straight line, use "least squares" to figure out that line (its slope and intercept), and then produce an equation that still models your data.