SEO Blog

Posts Tagged ‘Data’

Post-Panda: Data Driven Search Marketing

Posted by:  /  Tags: , , , ,

Now is the best and exciting time to be in marketing. The new data-driven approaches and infrastructure to collect customer data are truly changing the marketing game, and there is incredible opportunity for those who act upon the new insights the data provides” – Mark Jeffrey, Kellog School Of Management

I think Jeffries is right – now is one of the best and exciting times to be in marketing!

It is now cheap and easy to measure marketing performance, so we are better able to spot and seize marketing opportunities. If we collect and analyze the right data, we will make better decisions, and increase the likelihood of success.

As Google makes their system harder to game using brute force tactics, the next generation of search marketing will be tightly integrated with traditional marketing metrics such as customer retention, churn, profitability, and customer lifetime value. If each visitor is going to be more expensive to acquire, then we need to make sure those visitors are worthwhile, and the more we engage visitors post-click, the more relevant our sites will appear to Google.

We’ll look at some important metrics to track and act upon.

But first….

Data-Driven Playing Field

There is another good reason why data-driven thinking should be something every search marketer should know about, even if some search marketers choose to take a different approach.

Google is a data-driven company.

If you want to figure out what Google is going to do next, then you need to think like a Googler.
Googlers think about – and act upon – data.


Douglas Bowman, a designer at Google, left the company because he felt they placed too much reliance on data over intuition when it came to visual design decisions.

Yes, it’s true that a team at Google couldn’t decide between two blues, so they’re testing 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can’t operate in an environment like that. I’ve grown tired of debating such miniscule design decisions. There are more exciting design problems in this world to tackle

Regardless of whether you think acting on data or intuition is the right idea, if you can relate to the data-driven mindset and the company culture that results, you will better understand Google. Searcher satisfaction metrics are writ-large on Google’s radar and they will only get more refined and granular as time goes on.

Update Panda was all about user engagement issues. If a site does not engage users, it is less likely to rank well.

As Jim Boykin notes, Google are interested in the “long click”:

On the most basic level, Google could see how satisfied users were. To paraphrase Tolstoy, happy users were all the same. The best sign of their happiness was the “long click”. this occurred when someone went to a search result, ideally the top one, and did not return. That meant Google has successfully fulfilled the query. But unhappy users were unhappy in their own ways, most telling were the “short clicks” where a user followed a link and immediately returned to try again. “If people type something and then go and change their query, you could tell they aren’t happy,” says (Amit) Patel. “If they go to the next page of results, it’s a sign they’re not happy. You can use those signs that someone’s not happy with what we gave them to go back and study those cases and find places to improve search.

In terms of brand, the more well known you are, the more some of your traffic is going to be pre-qualified. Brand awareness can lower your bounce rate, which leads to better engagement signals.

Any site is going to have some arbitrary brand-related traffic and some generic search traffic. Where a site has good brand-related searches, those searches create positive engagement metrics which lift the whole of the site. The following chart is conceptual, but it drives the point home. As more branded traffic gets folded into the mix, aggregate engagement metrics improve.

If your site and business metrics look good in terms of visitor satisfaction – i.e. people are buying what you offer and/or reading what you have to say, and recommending you to their friends – it’s highly likely your relevancy signals will look positive to Google, too. People aren’t just arriving and clicking back. They are engaging, spending time, talking about you, and returning.

Repeat visits to your site, especially from logged-in Google users with credit cards on file, are yet another signal Google can look at to see that people like, demand and value what you offer.

Post-Panda, SEO is about the behavior of visitors post-click. In order to optimize for visitor satisfaction, we need to measure their behavior post-click and adjust our offering. A model that I’ve found works well in a post-Panda environment is a data-driven approach, often used in PPC. Yes, we still have to do link building and publish relevant pages, but we also have to focus on the behavior of users once they arrive. We collect and analyze behavior data and feed it back into our publication strategy to ensure we’re giving visitors exactly what they want.

What Is Data Driven Marketing?

Data driven marketing is, as the name suggests, the collection and analysis of data to provide insights into marketing strategies.

It’s a way to measure how relevant we are to the visitor, as the more relevant we are, the more positive our engagement metrics will be. A site can constantly be adapted, based on the behavior of previous visitors, in order to be made more even more relevant.

Everyone wins.

The process involves three phases. Setting up a framework to measure and analyze visitor behaviour, testing assumptions using visitor data, then optimizing content, channels and offers to maximize return. This process is used a lot in PPC.

Pre-web, this type of data used to be expensive to collect and analyse. Large companies engaged market researchers to run surveys, focus groups, and go out on the street to gather data.

These days, collecting input from consumers and adapting campaigns is as easy as firing up analytics and creating a process to observe behaviour and modify our approach based on the results. High-value data analysis and marketing can be done on small budgets.

Yet many companies still don’t do it.

And many of those that do aren’t measuring the right data. By capturing and analysing the right data, we put ourselves at a considerable advantage to most of our competitors.

In his book Data Driven Marketing, Jeffrey notes that the lower performing companies in the Fortune 500 were spending 4% less than the average on marketing, and the high performers were investing 20% more than average. Low performers focused on demand generation – sales, coupons, events – whereas high performers spend a lot more on brand and marketing infrastructure. Infrastructure includes the processes and software tools needed to capture and analyse marketing data.

So the more successful companies are spending more on tools and process than lower performing companies.

When it comes to the small/medium sized businesses, we have most of the tools we need readily available. Capturing and analyzing the right data is really about process and asking the right questions.

What Are The Right Questions?

We need a set of metrics that help us measure and optimize for visitor satisfaction.

Jeffrey identifies 15 data-analysis areas for marketers. Some of these metrics relate directly to search marketing, and some do not. However, it’s good to at least be aware of them as these are the metrics traditional marketing managers use, so might serve as inspiration get us thinking about where the cross-overs into search marketing lay. I recommend reading his book to anyone who wants a crash course in data-driven marketing and to better understand where how marketing managers think.

  • Brand awareness
  • Test Drive
  • Churn
  • Customer satisfaction
  • Take rate
  • Profit
  • Net Present Value
  • Internal Rate Of Return
  • Payback
  • Customer Lifetime Value
  • Cost Per Click
  • Transaction Conversion Rate
  • Return On Ad Dollars Spent
  • Bounce Rate
  • Word Of Mouth (Social Media Reach)

I’ll re-define this list and focus on a few metrics we could realistically use that help us optimize sites and offers in terms of visitor engagement and satisfaction. As a bonus, we’ll likely create the right relevancy signature Google is looking for which will help us rank well. Most of these metrics come directly from PPC.

First, we need a…..dashboard! Obviously, a dashboard is a place where you can see how you’re progressing, at a glance, measured over time. There are plenty of third party offerings, or you can roll-your-own, but the important thing is to have one and use it. You need a means to measure where you are, and where you’re going in terms of visitor engagement.

1. Traffic Vs Leads

Traffic is a good metric for display and brand purposes. If a site is making money based on how many people see the site, then they will be tracking traffic.

For everyone else, combining the two can provide valuable insights. If traffic has increased, but the site is generating the same number of leads – or whatever your desired engagement action may be, but I’ll use the term “leads” to mean any desired action – then is that traffic worthwhile? Track how many leads are closed and this will tell you if the traffic is valuable. If the traffic is high, but engagement is low, then visitors are likely clicking back, and this is not a signal Google deems favorable.

This data is also the basis for adjusting and testing the offer and copy. Does engagement increase or decrease after you’ve adjusted the copy and/or the offer?

2. Search Channel Vs Other Channels

Does search traffic result in more leads than, say, social media traffic? Does it result in more leads vs any other channel? If so, then there is justification to increase spending on search marketing vs other channels.

Separate marketing channels out so you can compare and contrast.

3. Channel Growth

Is the SEM channel growing, staying the same, or declining vs other channels?

Set targets and incremental milestones. Create a process to adjust copy and offers and measure the results. The more conversions to desired action, the better your relevancy signal is likely to be, and the more you’ll be rewarded.

You can get quite granular with this metric. If certain pages are generating more leads than others as the direct result of keyword clicks, then you know which keyword areas to grow and exploit in order to grow the performance of the channel as a whole. It can be difficult to isolate if visitors skip from page to page, but it can give you a good idea which entry pages and keywords kick it all off.

4. Paid Vs Organic

If a search campaign is running both PPC and SEO, then split these two sources out. Perhaps SEO produces more leads. In which case, this will justify creating more blog posts, articles, link strategies, and so on.

If PPC produces more leads, then the money may be better spent on PPC traffic, optimizing offers and landing pages, and running A/B tests. Of course, the information gleaned here can be fed into your organic strategies. If the content works well in PPC, it is likely to work well in SEO, at least in terms of engagement.

5. Call To Action

How do you know if a call to action is working? Could the call to action be worded differently? Which version of the call to action works best? Which position does it work best? Does the color of the link make a difference?

This type of testing is common in PPC, but less so in SEO. If SEO pages are optimized in this manner, then we increase the level of engagement and reduce the click-back.

6. Returning Visitor

If all your visitors are new and never return, then your broader relevance signals aren’t likely to be great.

This doesn’t mean all sites must have a high number of return visitors in order to deemed relevant – one-off sales sites would be unlikely to have return visitors, yet a blog would – however, if your site is in a class of sites where every other site listed is receiving return visits, then your site is likely to suffer by comparison.

Measure the number of return visitors vs new visitors. Think about ways you can keep visitors coming back, especially if you suspect that your competitors have high return visitor rates.

7. Cost Per Click/Transaction Conversion Rate/Return On Ad Dollars Spent

PPC marketers are familiar with these metrics. We pay per click (CPC) and hope the visitor converts to desired action. We get a better idea of the effectiveness of keyword marketing when we combine this metric with transaction conversion rate (TCR) and return on ad dollars spent (ROA). TCR = transaction conversion rate; the percentage of customers who purchase after clicking through to your website. ROA = return on ad dollars spent.

These are good metrics for SEOs to get their heads around, too, especially when justifying SEO spends relative to other channels. For cost per click, use the going rate on Adwords and assign it to the organic keyword if you want to demonstrate value. If you’re getting visitors in at a lot lower price per click the SEO channel looks great. The cost-per-click in SEO is also the total cost of the SEO campaign divided by clicks over time.

8. Bounce Rate

Widely speculated to be an important metric post-Panda. Obviously, we want to get this rate down, Panda or not.

If you’re seeing good rankings but high bounce rates for pages it’s because the page content isn’t relevant enough. It might be relevant in terms of content as far as the algorithm sees it, but not relevant in terms of visitor intent. Such a page may drift down the rankings over time as a result, and it certainly doesn’t do other areas of your business any good

9. Word Of Mouth (Social Media Reach/Brand)

Are other people talking about you? Do they repeat your brand name? Do they do so often? If you can convince enough people to search for you based on your name, then you’ll “own” that word. Google must return your site, else they’ll be seen as lacking.

Measuring word-of-mouth used to be difficult but it’s become a lot easier, thanks to social media and the various information mining tools available. Aaron has written a lot on the impact of brand in SEO, so if this area is new to you, I’d recommend reading back through The Rise Of Brand Over Time, Big Brands and Potential Brand Signals For Panda.

10. Profit

It’s all about the bottom line.

If search marketers can demonstrate they add value to the bottom line, then they are much more likely to be retained and have budget increased. This isn’t directly related to Panda optimization, other than in the broad sense that the more profitable the business, the more likely they are keeping visitors satisfied.

Profit = revenue – cost. Does the search marketing campaign bring in more revenue that it costs to run? How will you measure and demonstrate this? Is the search marketing campaign focused on the most profitable products, or the least? Do you know which products and services are the most profitable to the business? What value does your client place on a visitor?

There is no one way of tracking this. It’s a case of being aware of the metric, then devising techniques to track it and add it to the dashboard.

11. Customer Lifetime Value

Some customers are more important than others. Some customers convert, buy the least profitable service or product, and we never hear from them again. Some buy the most profitable service or product, and return again and again.

Is the search campaign delivering more of the former, or the latter? Calculating this value can be difficult, and relies on internal systems within the company that the search marketer may not have access to, but if the company already has this information, then it can help validate the cost of search marketing campaigns and to focus campaigns on the keyword areas which offer the most return.

Some of these metrics don’t specifically relate to ranking, they’re about marketing value, but perhaps an illustration of how some of the traditional marketing metrics and those of search marketers are starting to overlap. The metrics I’ve outlined are just some of the many metrics we could use and I’d be interested to hear what other metrics you’re using, and how you’re using them.

Optimizing For Visitor Experience

If you test these metrics, then analyse and optimize your content and offers based on your findings, not only will this help the bottom line, but your signature on Google, in terms of visitor relevance, is likely to look positive because of what the visitor does post-click.

When we get this right, people are engaging. They are clicking on the link, they’re staying rather than clicking back, they’re clicking on a link on the page, they’re reading other pages, they’re interacting with our forms, they’re book-marking pages or telling others about our sites on social media. These are all engagement signals, and increased engagement tends to indicate greater relevance.

This is diving deeper than a traditional SEO-led marketing approach, which until quite recently worked, even if you only operated in the search channel and put SEO at the top of the funnel. It’s not just about the new user and the first visit, it’s also about the returning visitor and their level of engagement over time. The search visitor has a value way beyond that first click and browse.

Data-driven content and offer optimization is where SEO is going.


SEO Book

Excel Statistics for SEO and Data Analysis

Posted by:  /  Tags: , , ,

Posted by Virgil

Everybody has probably already realized that there is almost no data that we cannot get. We can get data about our website by using free tools, but we also spend tons of money on paid tools to get even more. Analyzing the competition is just as easy, competitive intelligence tools are everywhere, we often use Compete or Hitwise. Opens Site Explorer is great for getting more data about our and competitors backlink profile. No matter what information we are trying to get, we can, by spending fortunes or no money. My favorite part is that almost every tool has one common feature and that is the "Export" button. This is the most powerful feature of all these tools because by exporting the data into Excel and we can sort it, filter it and model it in any way we want. Most of us use Excel on the regular basis, we are familiar with the basic functions but Excel can do way more than that. In the following article I will try to present the most common statistical techniques and the best part it is that we don't have to memorize complicated statistical equations, it's everything built into Excel!

Statistics is all about collecting, analyzing and interpreting data. It comes very handy when decision making faces uncertainty. By using statistics, we can overcome these situations and generate actionable analysis.

Statistics is divided into two major branches, descriptive and inferential.

Descriptive statistics are used when you know all the values in the dataset. For example, you take a survey of 1000 people asking if they like oranges, with two choices (Yes and No). You collect the results and you find out that 900 answered Yes, and 100 answered No. You find the proportion 90% is Yes 10 is No. Pretty simple right?

But what happens when we cannot observe all the data?

When you know only part of your data than you have to use inferential statistics. Inferential statistics is used when you know only a sample (a small part) from your data and you make guesses about the entire population (data).

Let's consider you want to calculate the email open rate for the last 24 months, but you have data only from the last six months. In this case, assuming that from 1000 emails you had 200 people opening the email, which resulted in 800 emails that didn't convert. This equates to 20% open rate and 80% who did not open. This data is true for the last six months, but it might not be true for 24 months. Inferential statistics helps us understand how close we are to the entire population and how confident we are in this assumption.

The open rate for the sample may be 20% but it may vary a little. Therefore, let's consider +- 3% in this case the range is from 17% to 23%. This sounds pretty good but how confident are we in these data? Alternatively, what percentage of a random sample taken from the entire population (data set) will fall in the range of 17%-23%?

In statistics, the 95% confidence level is considered to be reliable data. This means 95% of the sample data we take from the entire population will produce an open rate of 17-23%, the other 5% will be either above 23% or below 17%. But we are 95% sure that the open rate is 20% +- 3%

The term data stands for any value that describes an object or an event such as visitors, surveys, emails.

The term data set has two components, observation unit, which is for example visitors and the variables that can represent the demographic characteristics of your visitors such as age, salary or education level. Population refers to every member of your group, or in web analytics all the visitors. Let's assume 10,000 visitors.

A sample is only a part of your population, based on a date range, visitors who converted, etc. but in statistics the most valuable sample is considered a random sample.

The data distribution is given by the frequency with which the values in the data set occur. By plotting the frequencies on a chart, with the range of the values on the horizontal axis and the frequencies on the vertical axis, we obtain the distribution curve. The most commonly used distribution is the normal distribution or the bell-shaped curve.

An easy way to understand this is by considering the number of visitors a website has. For example the number of visits are on average of 2000/day but it happens to have more visits such as 3000 or less 1000.

Here, probability theory comes in handy.

Probability stands for the likelihood of an event happening such as having 3,000 visitors/day and is expressed in percentages.

The most common example of probability that probably everybody knows is the coin flip. A coin has two faces, head and tail, what is the probability when flipping a coin to have head? Well there are two possibilities so 100%/2=50%.

Enough with theories and let's get a little bit more practical.

Excel is an amazing tool that can help us with statistics, it's not the best but we all know how to use it so let's dive right into it.

First, install the Analysis ToolPack.
Open Excel, Go to Options -> Add-ins->at the bottom we will find

Hit Go ->select Analysis ToolPack->and click OK.

Now under the Data tab we will find Data Analysis.

The Data Analysis tool can give you supper fancy statistical information but first let's start with something easier.

Mean, Median, and Mode

Mean is the statistical meaning of average, for example the mean or average of 4,5,6 is 5 how we calculate in excel the mean? =average(number1,number2,etc)


By calculating the mean we know how much we sold on average. This information is valuable when there are no extreme values (or outliers). Why? It looks like we sold on average $ 3000 worth of products, but actually we were lucky that somebody spent more on September 6. But actually we did pretty poorly during the previous six days, with an average of only $ 618. Excluding the extreme values from the mean can reflect a more relevant performance rate.

The median is the observation situated in the middle of the data set. For example, the median of 224, 298, 304 is 298. In order to calculate the mean for a large set of data we can use the following formula =MEDIAN(224,298,304)

When is the median useful? Well, the median is useful when you have a skewed distribution, for example you are selling candies for $ 3 up to $ 15 a bag but you have some very expensive candies for $ 100 a bag that nobody really purchases on a regular basis. At the end of your month you have to make a report and you will see that you sold mostly cheap candies and only a couple of the $ 100. In this case calculating median is more beneficial.

The easiest way to determine when to use the median vs. the mean is by creating a histogram. If your histogram is skewed with an extreme, then you know that the best way to go is by calculating the median.

The mode is the most common value, for example the mode for: 4,6,7,7,7,7,9,10 is 7

In Excel you can calculate the mode by using the =MODE(4,6,7,7,7,7,9,10) formula.

Although this looks nice keep in mind that in Excel the lowest mode is considered, or in other words, if you have to calculate the mode for the following data set 2,2,2,4,5,6,7,7,7,8,9 you can see that you have two modes, 2 and 7 but Excel will show you only the smallest value: 2.

When can we use the mode function? Calculating the mode is beneficial only for whole numbers such as 1, 2 and 3. It is not useful for fractional numbers such as 1,744; 2.443; 3,323, as the chance to have duplicated numbers, or a mode, is very small.

A great example of calculating the mode, or the most frequent number, will be probably on a survey.


Let's say your blog recently received hundreds of guest posts, some of them are very good ones but some of them are just not that good. Maybe you want to see how many of your blog posts received 10 backlinks, 20, 30 and so on, or maybe you are interested in social shares such as tweets or likes, but why not just simply visits.

Here we will categorize them into groups by using a visual representation called histograms. In this example I will use visits/articles as an easy example. The way I setup my Google Analytics account is as follows. I have a profile that tracks only my blog, nothing else. If you don't have such profile setup yet, then you can create a segment on the fly.

How are you doing this? Pretty simple:

Now go to export->CSV

Open the excel spread sheet and delete all the columns except for Landing Page and Visits. Now create the ranges (also called bins) that you want to be categorized into. Let's say we want to see how many articles generated 100 visits, 300, 500 and so on.

Got to Data -> Data Analysis->Histograms->OK

  • Input range will be the visits column
  • Bin Range will be the groups
  • Output Range, click on the cell where you want your histogram to show up
  • Check Chart Output
  • Click OK

Now you have a nice histogram that shows you the number of articles categorized by visits. To make it easier to understand this histogram, click on any cell from the Bin and Frequency table and sort the frequency by low to high.

Analyzing now the data is even easier. Now go back and sort all the articles with less or equal to 100 visits (Visit drop down->Number filters->Between…0-100->Ok) in the last month and update them, or promote them.

Visits by source

How valuable this report is for you?

It's pretty good but not amazing. We can see ups and downs but…how much did YouTube contribute in February to the total visits? You can drill down but that is extra work, and it is very uncomfortable when the question arrives on a phone call with a client. To get the most out of your graphs, create valuable self-descriptive reports.

The report above is so much easier to understand. It takes more time to create it but it's more actionable.

What we can see is that in May, Facebook had a bigger amount of contribution to the total than in general. How come? Probably the May marketing campaign was more effective than in other months, resulting in a lot of traffic. Go back and do it again! If it was a working solution, then repeat it.

If you consider that May is just by chance bigger than the rest of the months, then you should create a Chi-Square Test to make sure that the increase in visits is not by chance and it is statistically proven the effectiveness of your campaign.

The actual column is the number of visits, the expected column is the Mean(average) of the "actual" column. The formula of the Chi-Square test is =1-CHITEST(N10:N16,O10:O16) where N10:N16 are the values from Actual and O10:O16 the values from Expected.

The result of 100% is the confidence level that you can have when considering that the work invested in every month campaign impacts the number of visitors coming from Facebook.

When creating metrics, make them as easy as possible to understand, and relevant to the business model. Everybody should understand your reports.

The video below explains pretty well another example of Chi-Square function:

Moving average and linear regression for forecasting

We often see graphs like the one above. It can represent sales or visits, it doesn't really matter, it is constantly going up and down. There is a lot of noise in the data that we probably want to eliminate to generate a better understanding.

The solution, moving average! This technique is sometimes used by traders for forecasting, the Stock prices are booming one day but in the second they are hitting the floor.

Let's see how we can use their basic techniques to make it work for us.

Step 1:
Export to excel the number of visits/sales for a long time period, such as one or two years.

Step 2:
Go to Data-> Data Analysis -> Moving Average ->OK

Input range will be the column with the number of visits

Interval will be the number of days on which the average is created. Here you should create one moving average with a higher number such as 30 and another one with a smaller number such as 7.

Output range will be the column right next to the visits column.

Repeat the steps for the interval of 7 days

Personal Preference: I didn't check the chart output and standard error box on purpose, I will create a graph later on.

Your data now probably looks similar to this:

Now if you select all the columns and create a line chart it will look like this:

This representation has less noise, it is easier to read and it shows some trends, the green line cleans up a little bit in the chart but it reacts to almost every major event. The red line instead is more stable and it shows a real trend.

At the end of the line chart you can see that it says Forecast. That is forecasted data based on previous trends.

In Excel there are two ways for creating a linear regression, using the formula =FORECAST(x,known_y's, known_x's) where "x" stands for the date you want to forecast, "known_y's" are the visits column and "known_x's" are the date column. This technique is not that complicated but there is an easier way to do this.

By selecting the entire visits column and dragging down the field handle it will automatically forecast for the following dates.

Note: Make sure to select the entire data set in order to generate an accurate data set.

There is a theory when comparing a 7day moving average and a 30day. As said above the 7day line reacts to almost every major change while the 30day one requires more time to change its direction. As a rule of thumb when the 7day moving average is intersecting the 30day moving average then you can expect a major change that will last longer than a day or two. As you can see above around April 6th the 7 day moving average is intersecting the 30 day one and the number of visits are going down, around June 6th the lines are crossing again and the trends are going upward. This technique is useful when you are losing traffic and you are not yet sure if it is just the trend or it is just a daily fluctuation.


The same results can be achieved by using the trend line feature of excel: Right click on the wiggling line -> select: Add Trendline

Now you can select the Regression Type and you can use the Forecast feature as well. Trendlines are probably the most useful to find out if your traffic/sales are going upward, downward or are simply flat.

Without the linear function we cannot confidently tell if we are doing better or not. By adding a linear trendline we can see that the slope is positive the trendline equation explains how our trend is moving.

X represents the number of days. The coefficient to x, 0.5516, is a positive number. This means that the trendline is going upward. In other words every day that passes by we increase the number of visitors with 0.5 as a trend.

R^2 represents the level of accuracy of the model. Our R^2 number is 0.26 which implies that our model explains 26% of the variations. Simply said: we are 26% confident that every other day that passes by our number of visitors increases with one new visitor.

Seasonal Forecasting

Christmas is coming soon and forecasting the winter season can be beneficial especially when your expectations are high.

If you didn't get hit by Panda or Penguin and your sales/visitors are following a seasonal trend, then you can forecast a pattern for sales or visitors.

Seasonal forecasting is technique that enables us to estimate future values of a data set that follows a recurring variation. Seasonal datasets are everywhere, an ice cream store will be very profitable during the summer season and a gift store can reach the maximum sales during the winter holidays.

Forecasting data for near future can be very beneficial, especially when we planning to invest money in marketing for those seasons.

The following example is a basic model but this can be expanded to a more complex one to fit your business model.

Download the Excel forecasting example

I will break up the process into steps to be easier to follow. The best way to implement it for your business is by downloading the Excel spreadsheet and following the steps:

  • export your data, the more data you have the better forecasting you can make! and place the dates into column A and sales into column B.
  • Calculate the index for each month and add the data in column C

In order to calculate the index scroll down at the bottom right of the spreadsheet and you will find a table called Index. The index for Jan-2009 is calculated by dividing the sales from Jan-2009 by the average sales of the entire year 2009.

Repeat calculating the index for every month of every year.

In column S38 to S51 we calculated the average index for every month

Because our seasonality is every 12 month we copied the index means into column C over and over again matching up every month. As you can see January of 2009 has the same index data as January 2010 and 2011

  • In column D calculate the Adjusted data by dividing the monthly sales by the index =B10/C10
  • Select the values from column A, B and D and create a line chart
  • Select the adjusted line (in my case the Red line) and add a linear trendline, check the "Display Equation on Chart" box

  • Calculate the backcasted non-seasonal data by multiplying the monthly sales by the coefficient from the trandline equation and adding the constant from the equation (column E)

After creating the trendline and we displayed the Equation on the chart we consider the Coefficient the number which is multiplied by X and the constant the number that is usually has a negative sign.

We place the coefficient into cell E2 and the Constant into cell F2

  • Calculate the Backcasted Seasonal data by multiplying the index (column C) with the previously calculated data (column E)
  • Calculate MPE(mean percentage error) by dividing sales by Backcasted seasonal minus 1 (=B10/F10-1)
  • Calculate MAPE (mean adjusted percentage error) by squaring the MPE column (=G10^2)

In my case cell F50 and F51 represents the forecasted data for Nov-2012 and Dec-2012. Cell H52 represents the error margin.

By using this technique we can say that in December 2012 we are going to make $ 22,022 +- 3.11%. Now go to your boss and show him how you can predict the future.

Standard deviation

Standard deviation tells us how much we deviate from the mean, in other words we can interpret it as a confidence level. For example if you have monthly sales, your daily sales will be different every day. Then you can use the standard deviation to calculate how much you deviate from the monthly average.

There are two Standard Deviation formulas in Excel that you can use.
=stdev -when you have sample data -> Avinash Kaushik explains in more details how sampling works


=stdevp -when you have the entire population, in other words you are analyzing every visitor. My personal preference is =stdev just because there are cases when the JS tracking code is not executed.

Let's see how we can apply Standard Deviation in our daily life

Probably you see the wiggling graph in analytics daily but it is not very intuitive. By using standard deviation in Excel you can easily visualize and understand better what is happening with your data. As you can see above, average daily visits were 501 with a standard deviation of 53, also the most important, you can see where you exceeded the normal so you can go back and check out which of your marketing efforts caused that spike.

For the Excel document use the following link


Correlation is the tendency that one variable change is related to another variable. A common example in web analytics can be the number of visitors and the number of sales. The more qualified visitors you have the more sales you have. Dr Pete has a nice infographic explaining correlation vs. causation

In Excel we use the following formula to determine the correlation:

As you can see above we have a correlation between Visits and Sales of 0.1. What does this mean?

  • between 0 and 0.3 is considered weak
  • between 0.3 and 0.7 is normal
  • above 0.7 is strong

The conclusion in our case is that daily visits don’t affect daily sales, which also means that the visitors that you are attracting are not qualified for conversion. You also have to consider your business sense when making a decision. But a correlation of 0.1 may not be overlooked.

If you want to correlate three or more datasets you can use the correlation function from the Data Analysis tool.

Data->Data Analysis->Correlation

Your result will look similar to this one:

What we can see here is that none of the elements correlate with each other:

  • Sales and visitors= correlation of 0.1
  • Sales and Social Shares = correlation of 0.23
  • Descriptive Statistics for quick analysis

Now you have a pretty good understanding of the mean, standard deviations etc. but calculating each statistical element can take a long time. The Data Analysis tool provides a quick summary of the most common elements.

  • Go to Data->Data Analysis-> Descriptive Statistics
  • Input Range – select the data you want to analyze
  • Output Range – select the cell where you want your table to be displayed
  • Check Summary Statistics

The result is pretty nice:

You already know most of the elements but what is new here is Kurtosis and Skewness

Kurtosis explains how far peaked the curve is from the mean, in other words the higher the kurtosis value is the bigger the peak is on the sides, in our case the kurtosis is a very low number which means the values are spread out evenly

Skewness explains if your data is negatively or positively skewed from a normal distribution. Now let me show you more visually what I mean:

Skeweness: -0.28 (the distribution is more likely oriented towards the higher values 2500 and 3000)
Kurtosis: -0.47 (we have a very small peak deviation from the center)

These are some of the techniques that you can use when analyzing data, the biggest challenge behind statistics and Excel is the ability of applying these techniques in various situations and not being limited to visits or sales. A great example of multiple statistical approaches implemented together was realized by Tom Anthony in his post about Link Profile Tool.

The examples above are just a small fraction of what can be done with statistics and Excel. If you are using other techniques that help you take faster and better decisions I would love to hear about them in the comment section.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

SEOmoz Daily SEO Blog

Which Data Matters Most to Marketers? Take the Survey!

Posted by:  /  Tags: , , , , ,

Posted by randfish

2012 was a year of triumphs and setbacks for marketers seeking the data to best accomplish their goals. Big improvements and additions in products like Google Analytics, GWMT, Bing Webmaster Tools, Mixpanel, KISSMetrics, Raven, and yes, SEOmoz PRO, too (along with dozens of others), helped many of us improve our reporting, auditing, and optimization efforts. But with the good came the bad, and setbacks like Google's expansion of keyword (not provided), the loss of referral data from iOS6, and kerfuffles over AdWords data appearing alongside rankings reared their heads, too.

When it comes to marketing data, I really like the concept behind Google's own mission statement: organize the world's information and make it universally accessible and useful. Unfortunately, I think the search giant has been falling short on a lot of the aspects that relate to our world, and thus it's up to third parties to pick up the slack. Moz is obviously part of that group, and we have plenty of self-interest there, but many other companies (and often Google and Bing themselves) are stepping in to help.

To help better understand the information that matters most to professionals in our field, we want to run a short survey focused specifically on data sources:

Data Sources Survey


We hope that this takes less than two minutes to complete, and that by aggregating broad opinions on the importance of data sources, we can better illustrate what matters most to marketers. In the spirit of transparency, we plan to share the results here on the Moz blog (possibly in an update to this post) in the next week or two.

Please help us out by taking the survey and by sharing it with your fellow marketers (or any professional you know who relies on marketing data).

Thanks very much!

*For those who have asked about SEOmoz's own plans regarding rankings vs. AdWords API data – we have removed AdWords search volume from our keyword difficulty tool (it was never part of the formula), and will be working on alternatives, possibly with the folks over at Bing. Like others in the field – Hubspot, Ginza, Conductor, Brightedge, Authority Labs, etc. – we plan to maintain rankings data in our software.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

SEOmoz Daily SEO Blog

Comparing Backlink Data Providers

Posted by:  /  Tags: , , ,

Since Ayima launched in 2007, we’ve been crawling the web and building our own independent backlink data. Starting off with just a few servers running in our Directory of Technology’s bedroom cupboard, we now have over 130 high-spec servers hosted across 2 in-house server rooms and 1 datacenter, using a similar storage platform as Yahoo’s former index.

Crawling the entire web still isn’t easy (or cheap) though, which is why very few data providers exist even today. Each provider makes compromises (even Google does in some ways), in order to keep their data as accurate and useful as possible for their users. The compromises differ between providers though, some go for sheer index size whilst others aim for freshness and accuracy. Which is best for you?

This article explores the differences between SEOMoz’s Mozscape, MajesticSEO’s Fresh Index, Ahref’s link data and our own humble index. This analysis has been attempted before at Stone Temple and SEOGadget, but our Tech Team has used Ayima’s crawling technology to validate the data even further.

We need a website to analyze first of all, something that we can’t accidentally “out”. Search Engine Land is the first that came to mind, very unlikely to have many spam links or paid link activity.

So let’s start off with the easy bit – who has the biggest result set for SEL?

The chart above shows MajesticSEO as the clear winner, followed by a very respectable result for Ahrefs. Does size matter though? Certainly not at this stage, as we only really care about links which actually exist. The SEOGadget post tried to clean the results using a basic desktop crawler, to see which results returned a “200” (OK) HTTP Status Code. Here’s what we get back after checking for live linking pages:

Ouch! So MajesticSEO’s “Fresh” index has the distinct smell of decay, whilst Mozscape and Ayima V2 show the freshest data (by percentage). Ahrefs has a sizeable decay like MajesticSEO, but still shows the most links overall in terms of live linking pages. Now the problem with stopping at this level, is that it’s much more likely that a link disappears from a page, than the page itself disappearing. Think about short-term event sponsors, 404 pages that return a 200, blog posts falling off the homepage, spam comments being moderated etc. So our “Tenacious Tim” got his crawler out, to check which links actually exist on the live pages:

Less decay this time, but at least we’re now dealing with accurate data. We can also see that Ayima V2 has a live link accuracy of 82.37%, Mozscape comes in at 79.61%, Ahrefs at 72.88% and MajesticSEO is just 53.73% accurate. From Ayima’s post-crawl analysis, our techies concluded that MajesticSEO’s crawler was counting URLs (references) and not actual HTML links in a page. So simply mentioning somewhere on a web page, was counting as an actual link. Their results also included URL references in JavaScript files, which won’t offer any SEO value. That doesn’t mean that MajesticSEO is completely useless though, I’d personally use it more for “mention” detection outside of the social sphere. You can then find potential link targets who mention you somewhere, but do not properly link to your site.

Ahrefs wins the live links contest, finding 84,496 more live links than MajesticSEO and 513,733 more live links than SEOmoz’s Mozscape! I still wouldn’t use Ahrefs for comparing competitors or estimating the link authority needed to compete in a sector though. Not all links are created equal, with Ahrefs showing both the rank-improving links and the crappy spam. I would definitely use Ahrefs as my main data source for “Link Cleanup” tasks, giving me a good balance of accuracy and crawl depth. Mozscape and Ayima V2 filter out the bad pages and unnecessarily deep sites by design, in order to improve their data accuracy and showing the links that count. But when you need to know where the bad PageRank zero/null links are, Ahrefs wins the game.

So we’ve covered the best data for “mentions”, the best data for “link cleanup”, now how about the best for competitor comparison and market analysis? The chart below shows an even more granular filter, removing dead links, filtering by unique Class C IP blocks and removing anything below a PageRank 1. By using Google’s PageRank data, we can filter the links from pages that hold no value or that have been penalized in the past. Whilst some link data providers do offer their own alternative to PageRank scores (most likely based on the original Google patent), these cannot tell whether Google has hit a site for selling links or for other naughty tactics.

Whilst Ahrefs and MajesticSEO hit the top spots, the amount of processing power needed to clean their data to the point of being useful, makes them untenable for most people. I would therefore personally only use Ayima V2 or Mozscape for comparing websites and analyzing market potential. Ayima V2 isn’t available to the public quite yet, so let’s give this win to Mozscape.

So in summary

  • Ahrefs – Use for link cleanup
  • MajesticSEO – Use for mentions monitoring
  • Mozscape – Use for accurate competitor/market analysis

Juicy Data Giveaway

One of the best parts of having your own index, is being able to create cool custom reports. For example, here’s how the big SEO websites compare against each other:

“Index Rank” is a ranking based on who has the most value-passing Unique Class C IP links across our entire index. The league table is quite similar to HitWise’s list of the top traffic websites, but we’re looking at the top link authorities.

Want to do something cool with the data? Here’s an Excel spreadsheet with the Top 10,000 websites in our index, sorted by authority: Top 10,000 Authority Websites.

Rob Kerry is the co-founder of Ayima, a global SEO Consultancy started in 2007 by the former in-house team of an online gaming company. Ayima now employs over 100 people on 3 continents and Rob has recently founded the new Ayima Labs division as Director of R&D.