SEO Finds In Your Server Log

Posted by timresnik I am a huge Portland Trail Blazers fan, and in the early 2000s, my favorite player was Rasheed Wallace. He was a lightning-rod of a player, and fans either loved or hated him. He led the league in technical fouls nearly every year he was a Blazer; mostly because he never thought he committed any sort of foul. Many of those said technicals came when the opposing player missed a free-throw attempt and ‘Sheed’ passionately screamed his mantra: “BALL DON’T LIE.” ‘Sheed’ asserts that a basketball has metaphysical powers that acts as a system of checks and balances for the integrity of the game. While this is debatable (ok, probably not true), there is a parallel to technical SEO: marketers and developers often commit SEO fouls when architecting a site or creating content, but implicitly deny that anything is wrong.    As SEOs, we use all sorts of tools to glean insight into technical issues that may be hurting us: web analytics, crawl diagnostics, and Google and Bing Webmaster tools. All of these tools are useful, but there are undoubtedly holes in the data. There is only one true record of how search engines, such as Googlebot, process your website. These are web server logs. As I am sure Rasheed Wallace would agree, logs are a powerful source of oft-underutilized data that helps keep the integrity of your site’s crawl by search engines in check.      A server log is a detailed record of every action performed by a particular server. In the case of a web server, you can get a lot of useful information. In fact, back in the day before free analytics (like Google Analytics) existed, it was common to just parse and review your web logs with software like AWStats .    I initially planned on writing a single post on this subject, but as I got going I realized that there was a lot of ground to cover. Instead, I will break it into 2 parts, each highlighting different problems that can be found in your web server logs:   This post: how to retrieve and parse a log file, and identifying problems based on your server’s response code (404, 302, 500, etc.). The next post: identifying duplicate content, encouraging efficient crawling, reviewing trends, and looking for patterns and a few bonus non-SEO related tips.  Step #1: Fetching a log file Web server logs come in many different formats, and the retrieval method depends on the type of server your site runs on. Apache and Microsoft IIS are two of the most common. The examples in this post will based on an Apache log file from SEOmoz.    If you work in a company with a Sys Admin, be really nice and ask him/her for a log file with a day’s worth of data and the fields that are listed below. I’d recommend keeping the size of the file below 1 gig as the log file parser you’re using might choke up. If you have to generate the file on your own, the method for doing so depends on how your site is hosted. Some hosting services store them in your home directory in a folder called /logs and will drop a compressed log file in that folder on a daily basis. You’ll want to make sure to it includes the following columns:   Host: you will use this to filter out internal traffic. In SEOmoz’s case, RogerBot spends a lot of time crawling the site and needed to be removed for our analysis.  Date: if you are analyzing multiple days this will allow you to analyze search engine crawl rate trends by day.  Page/File: this will tell you which directory and file is being crawled and can help pinpoint endemic issues in certain sections or with types of content. Response code: knowing the response of the server — the page loaded fine (200), was not found (404), the server was down (503) — provides invaluable insight into inefficiencies that the crawlers may be running into. Referrers: while this isn’t necessarily useful for analyzing search bots, it is very valuable for other traffic analysis. User Agent: this field will tell you which search engine made the request and without this field, a crawl analysis cannot be performed. Apache log files by default are returned without User Agent or Referrer — this is known as a “common log file.” You will need to request a “combine log file.” Make your Sys Admin’s job a little easier (and maybe even impress) and request the following format:   LogFormat “%h %l %u %t “%r” %> s %b “%Refereri” “%User-agenti”"   For Apache 1.3 you just need “combined CustomLog log/acces_log combined”   For those who need to manually pull the logs, you will need to create a directive in the httpd.conf file with one of the above. A lot more detail here  on this subject.     Step #2: Parsing a log file You probably now have a compressed log file like ‘mylogfile.gz’ and it’s time to start digging in. There are myriad software products, free and paid, to analyze and/or parse log files. My main criteria for picking one includes: the ability to view the raw data, the ability to filter prior to parsing, and the ability to export to CSV. I landed on Web Log Explorer (http://www.exacttrend.com/WebLogExplorer/) and it has worked for me for several years. I will use it along with Excel for this demonstration. I’ve used AWstats for basic analysis, but found that it does not offer the level of control and flexibility that I need. I’m sure there are several more out there that will get the job done.    The first step is to import your file into your parsing software. Most web log parsers will accept various formats and have a simple wizard to guide you through the import. With the first pass of the analysis, I like to see all the data and do not apply any filters. At this point, you can do one of two things: prep the data in the parse and export for analysis in Excel, or do the majority of the analysis in the parser itself. I like doing the analysis in Excel in order to create a model for trending (I’ll get into this in the follow-up post). If you want to do a quick analysis of your logs, using the parser software is a good option.    Import Wizard: make sure to include the parameters in the URL string. As I will demonstrate in later posts this will help us find problematic crawl paths and potential sources for duplicate content.     You can choose to filter the data using some basic regex  before it is parsed. For example, if you only wanted to analyze traffic to a particular section of your site you could do something like:      Once you have your data loaded into the log parser, export all spider requests and include all response codes:     Once you have exported the file to CSV and opened in Excel, here are some steps and examples to get the data ready for pivoting into analysis and action:    1. Page/File: in our analysis we will try to expose directories that could be problematic so we want to isolate the directory from the file. The formula I use to do this in Excel looks something like this.    Formula: =IF(ISNUMBER(SEARCH(“/”,C29,2)),MID(C29,(SEARCH(“/”,C29)),(SEARCH(“/”,C29,(SEARCH(“/”,C29)+1)))-(SEARCH(“/”,C29))),”no directory”)   2. User Agent: in order to limit our analysis to the search engines we care about, we need to search this field for specific bots. In this example, I’m including Googlebot, Googlebot-Images, BingBot, Yahoo, Yandex and Baidu.    Formula (yeah, it’s U-G-L-Y)   =IF(ISNUMBER(SEARCH(“googlebot-image”,H29)),”GoogleBot-Image”, IF(ISNUMBER(SEARCH(“googlebot”,H29)),”GoogleBot”,IF(ISNUMBER(SEARCH(“bing”,H29)),”BingBot”,IF(ISNUMBER(SEARCH(“Yahoo”,H29)),”Yahoo”, IF(ISNUMBER(SEARCH(“yandex”,H29)),”yandex”,IF(ISNUMBER(SEARCH(“baidu”,H29)),”Baidu”, “other”))))))   Your log file is now ready for some analysis and should look something like this:     Let’s take a breather , shall we?   Step # 3: Uncover server and response code errors The quickest way to suss out issues that search engines are having with the crawl of your site is to look at the server response codes that are being served. Too many 404s (page not found) can mean that precious crawl resources are being wasted. Massive 302 redirects can point to link equity dead-ends in your site architecture. While Google Webmaster Tools provides some information on such errors, they do not provide a complete picture: LOGS DON’T LIE.   The first step to the analysis is to generate a pivot table from your log data. Our goal here is to isolate the spiders along with the response codes that are being served. Select all of your data and go to ‘Data> Pivot Table.’   On the most basic level, let’s see who is crawling SEOmoz on this particular day:     There are no definitive conclusions that we can make from this data, but there are a few things that should be noted for further analysis. First, BingBot is crawling the site at about an 80% more clip. Why? Second, ‘other’ bots account for nearly half of the crawls. Did we miss something in our search of the User Agent field? As for the latter, we can see from a quick glance that most of which is accounting for ‘other’ is RogerBot — we’ll exclude this.    Next, let’s have a look at server codes for the engines that we care most about.     I’ve highlighted the areas that we will want to take a closer look. Overall, the ratio of good to bad looks healthy, but since we live by the mantra that “every little bit helps” let’s try to figure out what’s going on.    1. Why is Bing crawling the site at 2x that of Google? We should investigate to see if Bing is crawling inefficiently and if there is anything we can do to help them along or if Google is not crawling as deep as Bing and if there is anything we can do to encourage a deeper crawl.    By isolating the pages that were successfully served (200s) to BingBot the potential culprit is immediately apparent. Nearly 60,000 of 100,000 pages that BingBot crawled successfully were user login redirects from a comment link.      The problem: SEOmoz is architected in such a way that if a comment link is requested and JavaScript is not enabled it will serve a redirect (being served as a 200 by the server) to an error page. With nearly 60% of Bing’s crawl being wasted on such dead-ends, it is important that SEOmoz block the engines from crawling.    The solution: add rel=’nofollow’ to all comment and reply to comment links. Typically, the ideal method for telling and engine not to crawl something is a directive in the robots.txt file. Unfortunately, that won’t work in this scenario because the URL is being served via the JavaScript after the click.  GoogleBot is dealing with the comment links better than Bing and avoiding them altogether. However, Google is crawling a handful of links sucessfully that are login redirects. Take a quick look at the robots.txt  and you will see that this directory should probably be blocked.    2. The number of 302s being served to Google and Bing is acceptable, but it doesn’t hurt to review in case there are better ways for dealing with some of edge cases. For the most part SEOmoz is using 302s for defunct blog category architecture that redirects the user to the main blog page. They are also being used for private message pages /message, and a robots.txt directive should exclude these pages from being crawled at all.    3. Some of the most valuable data that you can get from your server logs are links that are being crawled that resolve in a 404. SEOmoz has done a good job managing these errors and does not have an alarming level of 404s. A quick way to identify potential problems is to isolate 404s by directory. This can be done by running a pivot table with “Directory” as your row label and count of “Directory” in your value field. You’ll get something like:     The problem: the main issue that’s popping here is 90% of the 404s are in one directory, /comments. Given the issues with BingBot and the JavaScript driven redirect mentioned above this doesn’t really come as a surprise.    The solution: the good news is that since we are already using rel=’nofollow’ on the comment links these 404s should also be taken care of.    Conclusion Google and Bing Webmaster tools provide you information on crawl errors, but in many cases they limit the data. As SEOs we should use every source of data that is available and after all, there is only one source of data that you can truly rely on: your own.    LOGS DON’T LIE!   And for your viewing pleasure, here’s a bonus clip for reading the whole post.   Sign up for The Moz Top 10 , a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

See more here:
SEO Finds In Your Server Log

10 Predictions for Inbound Marketing in 2013

Posted by randfish As is tradition here at Moz, I’m conducting my annual analysis of my predictions from 2012 , and if I score high enough, predicting what will happen in 2013. I like to use this process because it keeps me honest – if I suck at predicting what will happen in a 12-month span, should you really listen to me for the next 12 months? This year, I’m also broadening my focus beyond just SEO to all of inbound marketing – search, social, content, PR, CRO, and email. Hence, if my predictions from last year do well, I’ll be making a few more guesses about the year to come than usual. Here’s how scoring works: Spot On (+2)  - when a prediction hits the nail on the head and the primary criteria are fulfilled Partially Accurate (+1)  - predictions that are in the area, but are somewhat different than reality Not Completely Wrong (-1)  - those that landed near the truth, but couldn’t be called “correct” in any real sense Off the Mark (-2)  - guesses which didn’t come close The rules state that if the score is lower than +1 , I’m not allowed to make predictions for the coming year. Here’s to hoping! In 2012, I made 8 predictions: Bing will have a slight increase in US marketshare, but remain

Read More:
10 Predictions for Inbound Marketing in 2013

Outranking Google

Posted by Fryed7 “Know your enemy, know yourself, and you can fight a hundred battles without disaster…” The Art of War,  Sun Tzu   I wouldn’t say Google is the “enemy”, but all too often they’re far being from a friend. Understanding Google and understanding yourself will set you up to avoid catastrophe. Here on SEOmoz, we love reading about tactics. Smart, repeatable, step-by-step processes you can implement and see results from right away. Everything else is a frustration, right? So, if you’ll kindly bear with me… we’re going to talk strategy rather than tactics. How to future-proof your marketing from Google. Deep breaths. First, let me paint you a picture…   Imagine, YOU are Larry Page. You have billions of dollars to spend… (Image Credit:  One Billion Dollar (Most Expensive Artwork Ever ) You have thousands of super-talented software engineers. You also have thousands of super-savvy marketers. (Image Credit: Joel on Software ) You derive almost all your revenues currently from selling adverts. Oh, and you also have thousands of shareholders and analysts breathing down your back. What do you do?     Some ideas that come to mind… Turn commercially-focused searches such as shopping into a pay-to-play game . By-pass parasitic “search within search” sites and own other multi-billion dollar industries such as flights and hotels . Start experimenting with disrupting job search, insurance comparison, credit card comparison, people search, lawyer search, real estate search, Google+ dating… and put forward the convincing argument that it’s better for users (at least in the short term?). Use Adwords data to find other high-paying industries where Google can cut out the middleman, setup shop on their own, and take a higher margin. Buy out or joint venture with successful incumbents to gain rapid market share and infrastructure in these high-margin industries. Replicate the total dominance of Adwords in search in other media channels. Google TV, intelligent and responsive outdoor media, and Google Glasses (or whatever becomes of that) coupled with inevitable integration of everything with Google+ to give Google unparalleled reach and targeting to advertisers across every media channel. It begins to get very evil, very quickly… This is a new world we could be entering into. Basic rules of SEO  may begin to go out of the window. Building anchor text links to “hotels in New York” is meaningless when Google has rolled out their own solution straight into the search results. It sort of feels like this.   So what to do about the 600lb gorilla in the cage? Here are five strategies to get you thinking.     Strategy #1: Optimize Search Demand, not Search Supply This isn’t experimenting into influencing Google suggest , running Superbowl ads, or other similar short-term wins. You need to build something that, once someone knows about you, they’d be crazy not to come back to each time they need to buy. Brands, as companies and as products will perform better against Google. Building a brand stops both people and Google treating your products as commodities. They’ll come to you first. Zappos, for instance, strives to delight customers. Whether it’s the fast, free delivery and free returns for up to a year, or the huge resources pumped into phone calls to build relationships with customers, Zappos has built a truly great platform for customers. ~75% of their sales are from repeat customers . Being remarkable is important. Instead of relying on unbranded search terms for shoes, it’s better to use word of mouth marketing by your delighted customers. They might start at Google, but search instead for your brand rather than the product they want. Google, outranked! Similarly, invent your own search demand. Apple didn’t make a “tablet PC”. They made an iPad. The ensuing onslaught of consumer searches was for the “iPad” – a branded term. Since users love brands , and Google says it will continue to serve its users interests first , Google will steer out the way. You don’t even have to be a massive company conquering a massive industry to do this. The brand new startup Dollar Shave Club pulled off a one-hit video stunt, but the long term marketing win that delivers lasting value is people talking about their brand. Action: Build a Brand Branding isn’t just a name. It’s what other people call it and why they identify with it. ( Fast Company has an excellent primer on brand building ). How do people identify with your company and products? You need to spend time mapping this out and defining a brand for current and future customers . The community on Inbound.org has some great links on branding too. Of course, you have to make sure your all set up to win your branded SERPs. Here are two Whiteboard Friday refreshers for you on Dominating Your Brand SERPs  and the Renewed Value of Branding . Your Small First Step: Answer These Two Questions: What information is so critical to your customer’s next purchase that, if you had it on your site AND they knew about it, they’d be crazy not to check it out? What in your company can you brand so that you can manipulate search demand?   Strategy #2: Build Genuine Permission Assets If customers  really care about you, they don’t need Google to find you.  You need to build a customer base who want to hear from you, and who can buy from you in the future. These customers will be people who will come straight to you because they know and trust you. I bet you’ve read countless articles and guides on growing larger email lists , getting more twitter followers , and earning more likes on your Facebook page .  That information is great, but the trouble with this scoreboard mentality is that it focuses you on building sheer numbers rather than real engagement. A list of 100,000 subscribers isn’t really a list of 100,000 loyal fans. 50,000 Twitter followers aren’t really 50,000 people who will go out their way for you. 1,000 Facebook Likes isn’t really a list of 1000 people who will passionately defend the webpage and content if it’s ever criticized. The bar in and out is set too low. You have to gain genuine permission assets from your audience by their loyalty rather than numbers. What have your followers done for you lately? Look at some of these examples… TheOatmeal has a clear, loyal following. His tribe rallied behind him during his recent legal spat. Seth Godin has a clear, loyal following.  His tribe helped him convince publishers to put his upcoming book in physical stores. Zappos has a clear, loyal following. Their tribe post rave reviews and testimonials publicly on their Facebook page. In their thousands… ​ If your business closed down, website disappeared and employees disbanded today, would your customers, audience, and community miss you tomorrow? Or the next time they need to buy? Action: Build a Loyal Tribe of Customers You need to build a loyal audience and community or customers that will go out of there way for you, even if that means just skipping Google search results. Find the people who will miss you dearly when you’re gone. Those loyal few are your strongest asset. Don’t measure your audience by numbers, but measure their responses instead. How much revenue do they generate? How often do they send enquiries? What kind of email do they send to you? Build an community.  Connect your followers together, and build a stickier brand. Jen Lopez put together an excellent, pithy post on using community as an Inbound marketing channel . Your Small First Step: Connect a Dozen People Together See if these people would be interested in forming a community that aligns with your brand values  by seeding a relevant conversation. This ties in closely with the actions in Strategy #1, building a brand. This could be online (Twitter chat, LinkedIn group, webinar, G+ hangout) or offline (drinks, meetup, conference, breakfast).  BONUS!   Buy Tribes book by Seth Godin and/or watch Seth’s TED Talk on The Tribes We Lead  (It’s 20 minutes. You can watch it in your lunch break.).   Strategy #3: Prepare for Long-Term SEM If Google shopping and Google flights are any indicator of the future, it’s likely Google will put you on a diet of some kind of Adwords-type service you must adopt in order to keep you in the SERPs. That means you must be getting ready to master online advertising in your niche, which doesn’t work without knowing your lifetime customer value , costs per customer acquisition and conversion rates. Who’s to say you can’t thrive under Google? In search advertising in particular, where Adword’s quality score appears to tie more closely with SEO (relevant pages, strong social signals, passing “the panda questionnaire” ), continuing with traditional SEO appears to be the future for staying in the SERPs. SEOs and Adwords folks appear to be getting closer anyway , and there’s more and more relevant information we can learn from one another . In the long run for both, in competitive niches especially, knowing your numbers and driving down costs to acquire customers will only help win, be that for increasing PPC budget or SEO spend on content, outreach, acquiring data or anything else. Conversion rate optimization is the key to unlocking a prosperous future with Google. You need to get your team on top of this. Action: Conquer Conversion Rate Optimization In order to truly win at SEM and the Adwords game, you must conquer conversion rate optimization. Thankfully there are many great resources on CRO here on SEOmoz; my favourite so far is by Stephen Pavlovich. Send this to your team. I’ve always loved Conversion Rate Expert’s case studies for insights to processes as well as for reinforcing the case for CRO. Here’s an example of a case study where they doubled a companies conversion rate, making them £14 million extra that year , and another slightly older case study, but with a familiar face . SEM is process driven. CRO is process driven, too. The asset you need to build is a process for testing and winning at CRO. You need to bring your developers, designers, other marketers, and C-level execs on board with the idea of incremental benefits to CRO, and get them onboard with a continual process of testing new ideas. Incidentally, the same skills will be needed for mastering Adwords, when the time comes. Consider a rolling contest for people to suggest things to test, and if they move the needle by a significant percentage, a significant reward be dealt out. Keeping that in mind… Your Small First Step: Setup One Small CRO Experiment Read through the guides above, and pinpoint one small test you can implement. The first test might be painful as there are no processes in place to make it all happen easily, but once you’re setup you can run more and more experiments. But start with one. Today.  You could have tangible results at the end of the week. More money, please!   Strategy #4: Overseas Conquest Our search comrades in Russia, China, South Korea, Japan, and many other countries will still benefit from lack of Google dominance… for the time being, at least. Focus on targeting  places where Google is not inherently strong and is unlikely to invade within the medium term. There will still be good money to be made here, and often these are high-growth, emerging markets . Who in travel doesn’t want to be selling holidays to the emerging middle class in China? That said, “understand your enemy.” How long until Google, Microsoft, or even Facebook makes a move for Yandex, Baidu, Naver, and all the foreign incumbent search engines? Action: Optimize for “unGoogled” Emerging Markets First, take care of the essential technical SEO to target foreign countries. Rand put together a Whiteboard Friday on international SEO a while back, and  Matt Cutts also has some suggestions for using unique domains to target specific countries . Take a look at this detailed list of country domain extensions . Yandex, Baidu, and others all have broadly similar interests algorithmically, so you’re not going wrong following Western SEO advice you get from SEOmoz or Google’s user guidelines. A few links worth following and bookmarking:. An excellent blog on SEO for Russia, and Yandex in particular  at Russian Search Tips .  Yandex Webmaster Tools (in English) Baidu Webmaster Tools (not in English currently, but usable by auto-translate in Google Chrome) Submit your site to Baidu (sounds ol’ school doesn’t it?) here if you aren’t already indexed. SEO for Naver , by Search Engine Watch If you’ve got any additional helpful links to add, please post them in the comments :) Although it’s horrible, overwhelming advice… you’ll need to have language skills on your SEO team. Bring bilingual SEOs onboard by recruiting internally and externally. Foreign language skills are going to become invaluable for tapping lucrative emerging markets. Like having talented designers, developers, and marketing processes, you either have them or you don’t. Put yourself ahead of the competition. Your Small First Step: Find One Bilingual Helper Search within your organization, on LinkedIn, Facebook, maybe via local universities and colleges for people who have an interest in online marketing and language skills in emerging markets. It needn’t be something full time and permanent, but at least someone you can turn to and ask about their local market. Just one person who can speak Russian or Chinese or something significant. BONUS!  Buy your .cn, .ru, .kr etc. domains   Strategy #5: Build an Essential Step in the Chain Search is only one step in the chain. You can construct your business to force people and/or Google to come through you before or after visiting Google. There are two ways to do this: you can either win the context war (pre-commercial search) or you can win the fulfillment war (post-commercial search). Google can’t create contextual information surrounding a search without degrading their search quality. If Google starts inserting flight and hotel search results whenever you search for “Maui,” maybe looking for pictures for a project or something, it’s going to frustrate users. This is where you can win. Amazon jumps early on the e-commerce chain by becoming the canonical source of reviews and product research information. What’s stopping you from listing products on Amazon? Similarly, TripAdvisor drives huge volumes of traffic by becoming the canonical source of information for hotel reviews. Win the fulfillment war by becoming the one and only way of fulfilling a certain good. This might mean proprietary products, proprietary software or complete monopoly over a certain, specific market. Apple owns the supply chain for sales of their goods, but you don’t have to be a pan-global company to have a similar effect. Travelocity earns commissions from selling tickets. They launched a Travelocity rewards program for regular customers and offered various ways to earn points redeemable on more travel through booking tickets through them and using Travelocity-branded credit cards. This encourages people to keep returning to book through Travelocity, while still maintaining other loyalties and benefits such as frequent flier miles with the airlines they actual travel with. Action: Build an Essential Step of the Chain What content would be so incredibly useful that users would have to go through it? Take a look at Rand’s Whiteboard Friday from a few years back on The Path to Conversion ,  and use it to work out where you can add incredible value in your market. Alternatively, what value-add could you build into the chain that Google can’t touch? Could you add a loyalty program with unique rewards? Your First Small Step: Map Out The Buying Process from Research to Fulfillment … then brainstorm ideas around each one where you might be able to add value that can’t be copied easily.   In summary. Google has a ridiculous amount of resources and motivation to disrupt your market. They’re going to take your cake and eat it too, unless you can fight for your turf. Use these five strategies to fend off their advance: Build a Brand – Start by identifying your brand positioning Build Genuine Permission Assets – Connect a dozen people together + Buy book/watch Tribes talk by Seth Godin Prepare for Long-Term SEM – Start a small CRO test Conquer Emerging Markets – Find one bilingual helper + buy your foreign domains Build an Essential Step in the Chain  - Start by mapping out the buying process, from research to fulfillment PRO Tip: Do all of them! … but if none of these hit the spot, consider this …   “Strategy” #6: Can’t Beat ‘Em? Join Them! Image Credit:  This Green Machine Of course, if you can’t find a way of outranking Google in the long run, consider giving in. Expect Google+ to encircle your industry. Embrace G+ now , and win in the long run. Alternatively, consider selling out to Google. Or you could give up completely. Google is hiring . ;)   … and on that bombshell! I think it’s time to end! See you in the comments for more serious strategy talk, and also more “If I was CEO of Google I would _______________________” :) Sign up for The Moz Top 10 , a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Read More:
Outranking Google