Announcing Fresh Web Explorer

Posted by Matthew Brown Have you ever wished you had an easy way to track all of your links, social mentions, and web citations in one place? If so, you’re going to like the latest addition to your SEOmoz PRO account. Today, we are releasing a new beta product to our PRO subscribers: Fresh Web Explorer.   Try Fresh Web Explorer Why did we build Fresh Web Explorer? One of the most challenging tasks as an online marketer is keeping track of all the latest blogs, forums, and news sites on the web that mention your brand or site. Many of the tools out there can be frustrating to use and don’t have the metrics, scalability, or features that I need to effectively keep track of important links and mentions. Google Alerts can be hit or miss. Topsy is terrific, but it only covers social mentions. Trackur , Ubervu , Buzzstream, and SocialMention all offer a unique set of features, but I frequently rely on a number of different tools to provide me with an instant look into mentions of the sites and brands I track. We built Fresh Web Explorer to provide an easier way to give you a fast, comprehensive look at the latest mentions of and links to your content across the web.       What’s different about Fresh Web Explorer? Fresh Web Explorer (FWE) functions a lot like Open Site Explorer, so the interface will be familiar to OSE users. However, the data is extremely recent, and rather than just show you links, we grab full text content of articles, blog posts, forum threads, user comments, and other web content. FWE doesn’t just show you links, but all term, brand, or phrase mentions as well.    FWE is powered by our Freshscape Index, which is a 30 day index of 4.3 million feeds (and counting). There’s a new Freshscape index every eight hours, sortable by one week, two weeks, or 30 days of mentions. You can also sort your data by Feed Authority, our new metric created specifically for Fresh Web Explorer:     Feed Authority directly measures the importance of any feed on a scale of 1-100. It is a machine learning model that predicts the number of subscribers for a given feed and distinguishes among the many different feeds on any site. For example, it wil assign a lower score to a comment feed associated with a six-month-old blog post than the main feed associated with the blog. In this way, it is analogous to Page Authority, but applied to feeds. We currently use features extracted from crawling the feed (number of posts, post frequency, etc.) as well as Mozscape metrics to compute the score. Our data scientists are working to improve this metric, so expect to see some of the scores change as they refine the algorithm and introduce additional features.   Warning: We’re going to get even more nerdy about Feed Authority for a quick second. The chart below shows the distribution of Feed Authority across the Freshscape index: Approximately 25% of the index has a Feed Authority less than 2.0, with the other 75% having higher values. The feeds with low scores are mostly stale (no longer updated), have very few or no links, or have malformed XML. A similar graph for all feeds on the internet would have the opposite shape, with 75%+ of feeds having Feed Authority less than 2.0 (we confirmed this with a random sample of feeds from our Mozscape index). We minimized the number of low-quality feeds in our index by carefully building it from a set of high-quality blog directories and a curated list of feeds.   Smooth Operator Bringing it back to using FWE, there are a number of operators you can use to customize your search:     In particular, you may find yourself making extensive use of the ‘Match phrase exactly’ operator, by using double quotes around your search term or phrase. This cues Fresh Web Explorer to only return results where your phrase of terms appears on a page exactly as you searched for them in FWE, rather than returning results where the terms may appear  anywhere on the page and in any order. When searching on non-branded or very popular terms, using this operator may surface a more precise set of results from FWE.   Export FWE data to customize your reports If you’re inclined to mix and match this data with other sources, FWE provides you with the ability to export up to 10,000 mentions in the Freshscape index, in .csv format:     This export allows you to sort a large number of mentions by date found, Feed Authority, domain, HTML title, and URL. One of the additional fields available in the export that’s not in the FWE web interface: the feed source where FWE found the page containing the mention. This can provide useful insight into why a Feed Authority score might be low, even though the page mentioning your search is located on a strong domain.   We’ve put together a video walkthrough and a detailed FAQ  to get you started as well as answer additional questions.   Getting agile with FWE Fresh link and mention data have become critically important to online marketers. If you’re engaged in link building and outreach, having the ability to quickly sort recent mentions by source and date can make a world of difference in quick outreach to build audience for your content or brand. If you’re in the SEO trenches, you’re probably all too familiar with how freshness plays a role in Google and Bing search results. If you’ve watched the meteoric rise of sites like Buzzfeed , Business Insider , or Huffington Post , the formula to their success is pretty clear: Match content to the most recent user intent you can surface, then build links and social mentions to that content like crazy.   To get started, you can use FWE to engage in several high-ROI activities: Find recent mentions in FWE where you aren’t being linked to –  On news publications and high-volume blogs, the quicker you ask the writer for a link, the better chance you have of actually getting it. It’s much harder to convince them it’s worth the effort a month later. An effective technique that increases your chances even more is to add something new to the content that increases its value or changes the narrative of the story.  Competitor analysis -   Where are your competitors being mentioned? Are there feeds that highlight their content frequently? FWE is a good tool to build up your outreach list. Content Strategy – FWE allows you to check on or keep track of a set of terms over time, and helps you get a sense for what type of content gets a lot of mentions, shares, and links. For instance, a term like “World Cup 2014″   is already drawing significant interest as we get closer to the 2014 event in Brazil. Sites like Bleacher Report and Goal are already starting to stake out their claim in the SERPs . FWE can help you make strategic decisions on how to create and focus both new and legacy content on this type of quickly evolving user search intent. Our engineers have put in a lot of work to make the Freshscape index, and we will be using it to power additional features in the near future. Ready to give it a spin? Try Fresh Web Explorer Just like you, we’re just getting started with Fresh Web Explorer as a new tool in our marketing workflow. It’s a beta release, so we’re making improvements and squashing bugs quickly. You can flag suspicious results within the application, and we will use that feedback to make adjustments to the index.   Please send us over any questions or comments you have, and be sure to check out the Help video and FAQ.   We can’t wait to hear how you’re using it. Sign up for The Moz Top 10 , a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Excerpt from:
Announcing Fresh Web Explorer

The Broken Link Building Bible

Posted by russvirante This post was originally in YouMoz , and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc. Broken link building may perhaps be the most effective, white-hat link building strategy in years. In particular, broken link building is appealing because the success of the campaign is directly proportional to how much good you do for the web . You profit only if you create good content to replace lost or abandoned content that webmasters still want to link to. This is the type of strategy that marries so many of the competing interests our industry: content vs. links, link earning vs link building, inbound vs. outbound, etc. Below, I attempt to organize as much as I know about broken link building tactics. Throughout the piece I mention tools that will help you make the broken link building process scalable and less monotonous. Let’s begin. Table of Contents Overview Prospecting Resource Page Targeting w/ Keywords Selecting Keywords Prospecting Phrases Scraping Search Results Extracting URLs Header Checks Opportunity Qualification Prospecting Tools Resource Page Targeting w/ Model URL Site Selection Backlink Acquisition Extracting URLs Header Checks Opportunity Qualification Prospecting Shortcuts Direct URL Targeting Site Crawling Opportunity Selection Content Creation Rebuilding Tools Raised Expectations Outreach Contact Finding Email Templates Conclusions & Community Credits Overview Broken link building is a link building tactic where a marketer contacts a webmaster who has a broken link on his/her site and recommends one or more alternatives that include his/her target site. For the purposes of this piece, we will use a pediatrician in Raleigh, NC as an example client. Prospecting The first step in any Broken link building campaign is to find relevant dead pages. However, there are different methods of prospecting depending upon the broken link building strategy you are employing. There are essentially three types of broken link building strategies: Resource Page Targeting with Keywords Resource Page Targeting with URLs Direct URL Targeting We will cover each of these in the prospecting section. I will mention multiple tools throughout this post and will give descriptions of all of them at the end. Keyword Based Keyword based is the the most common and, in my opinion, straightforward method of broken link building. The method involves searching Google for keywords relevant to your site’s interests, finding resource pages that link to content related to your keywords, extracting all the links from those resource pages, finding missing pages among those links, and finally qualifying those opportunities. Select Prospecting Keywords Like so many things in SEO, we begin with keyword selection. A successful broken link building campaign lives and dies by the keywords used. There are a couple of characteristics we want to look for in an ideal keyword. Categorically relevant : This characteristic seems obvious. The prospecting keywords need to be relevant. However, they don’t necessarily have to be relevant to your product like the key phrase “health resources.” The keywords could be relevant to your audience “resources for kids” or your geography “Raleigh resources.” Remember, you are finding resource pages with these keywords, you are not finding the final targets. You want to cast a wide net, which leads to… Generally broad : This is where most campaigns fail. Our mock client is unlikely to find any resource pages for the keyword “raleigh nc pediatrician resources,” much less any with good link opportunities. You should choose key phrases that you would consider to be categories that your company might fall in, rather than the specific term. Prospecting Phrases : Once you have identified your keywords, you will want to pair them with prospecting phrases. These are searches to use in Google or Bing to find relevant resource and links pages like “intitle:resources” or “inurl:links.” Below is a list of prospecting phrases you can use to help find relevant linking pages. site:.gov links resources intitle:links intitle:resources intitle:sites intitle:websites inurl:links inurl:resources inurl:sites inurl:websites “useful links” “useful resources” “useful sites” “useful websites” “recommended links” “recommended resources” “recommended sites” “recommended websites” “suggested links” “suggested resources” “suggested sites” “suggested websites” “more links” “more resources” “more sites” “more websites” “favorite links” “favorite resources” “favorite sites” “favorite websites” “related links” “related resources” “related sites” “related websites” intitle:”useful links” intitle:”useful resources” intitle:”useful sites” intitle:”useful websites” intitle:”recommended links” intitle:”recommended resources” intitle:”recommended sites” intitle:”recommended websites” intitle:”suggested links” intitle:”suggested resources” intitle:”suggested sites” intitle:”suggested websites” intitle:”more links” intitle:”more resources” intitle:”more sites” intitle:”more websites” intitle:”favorite links” intitle:”favorite resources” intitle:”favorite sites” intitle:”favorite websites” intitle:”related links” intitle:”related resources” intitle:”related sites” intitle:”related websites” inurl:”useful links” inurl:”useful resources” inurl:”useful sites” inurl:”useful websites” inurl:”recommended links” inurl:”recommended resources” inurl:”recommended sites” inurl:”recommended websites” inurl:”suggested links” inurl:”suggested resources” inurl:”suggested sites” inurl:”suggested websites” inurl:”more links” inurl:”more resources” inurl:”more sites” inurl:”more websites” inurl:”favorite links” inurl:”favorite resources” inurl:”favorite sites” inurl:”favorite websites” inurl:”related links” inurl:”related resources” inurl:”related sites” inurl:”related websites” list of links list of resources list of sites list of websites list of blogs list of forums   Search Results Scraping : You now have the arduous task of finding all the results for all these prospecting phrases. Google is not fond of sending in automated requests, so you have a couple of choices. You complete the task by hand and use the MozBar to extract results , you can use a SERP scraping tool and risk Google’s ire, or you could look into use the Bing API, which would necessitate changing many of the search operators in the above list of prospecting phrases. Ultimately, you will want to pull down the top 100 results for each of the prospecting phrases you use. You will have quite a bit of crossover, so you will want to de-dupe those lists. You can use Virante’s free ” Duplicate Deleter ” tool to accomplish this, or you can simply use Excel’s remove duplicates function . Link Extraction : Once you have a culled list of potential “linking pages,” you need to extract every external link from these pages and begin the process of finding all the 404s. You can also combine this step with the 404 header check using a tool like Domain Hunter+or Check My Links. Link extraction: webmaster-toolkit.com iwebtool.com code.google.com Link extraction and 404 header check Domain Hunter Plus Check My Links 404 / Error Checking : Once you have extracted all the links, you will have to check the headers on each link to determine whether or not they are 404s, our ultimate target. If you used Domain Hunter Plus or Check My Links, you can skip this process. The easiest way to do this is with a simple HTTP Status Code checker. There is a free bulk tool here . Just copy and paste all your URLs here, without the http:// and it will find all the 404s for you. Opportunity Qualification : There are two things you will want to determine about each potential opportunity to vet them for quality: relevance and backlinks. Backlink acquisition : Once you have found a set of 404 pages, you now have to filter them to determine which are actually strong targets. The more backlinks pointing to a 404 page, the more opportunities you have for link replacement. These linking domains will be the sites you contact to replace the broken link with your own. There are several ways to do this, but the easiest at the moment is likely Majestic SEO’s bulk backlink checker . Remember, at this point you are trying just to get an idea of those with the most links and ignore those with very few. This will limit the amount of time you have on checking relevance. Relevance analysis : Now you filtered your list of 404 opportunities to those with a good number of unique linking domains. Let’s say that number is 50 or more. You now have to determine the relevancy of that content. You can do that a few ways: Visit the Wayback Machine (also known as the way back machine) to find cached copies of the URL in history. If the page is well linked and did not block web crawlers, you should be able to find the content here. If this is not available, you can look at the anchor text of the links pointing to the page. You can use SEOMoz Open Site Explorer to get an export of the anchor text. You can look at the URL itself for hints as to how relevant the content would be. You can visit the linking pages to see if those links have descriptions of what the previous content was. Prospecting Shortcuts : There are two tools that you could use to jump over a lot of these steps. Broken Link Index ( brokenlinkindex.com ) : This tool by iAcquire allows you to find tons of potential 404 pages from their gigantic database of opportunities. Unfortunately, all of the link qualifications have to be done one at a time, although you could export the list and automate the process if you are savvy. Broken Link Builder ( brokenlinkbuilding.com ) : This tool by CitationLabs is not free, but allows you to perform all of the actions above in an automated fashion. Just type in your kewords and it performs all of the steps above, from finding opportunities to qualifying them based on links and relevance. This is by far the most robust broken link building tool currently available and a huge time saver. Resource Page Targeting w/ Model URL Unlike using keywords, this method starts with a known site and mines their backlinks to relevant resource pages that, in turn, produce broken link building opportunities. Site / URL Selection : This is by far the most important part of the process. Choosing the right site will make or break this strategy. I do want to give a nod to Garrett French for pointing this method out to me a few months ago. There are a couple of factors you want to use in identifying the perfect site or URL. Non-commercial: In most cases, you want a non-commercial source. If the site has a direct incentive to acquire links, chances are there will be too much manipulated link noise in their backlink profile to properly mine them for broken link building opportunities. Authoritative: If the site is not authoritative, it likely has attracted few links from resources that aggregate important links on the web. These are the resource pages from which we will find 404 opportunities. If they aren’t linking to your selected URL, you are wasting your time. Relevant: Obviously, the site needs to be relevant to your industry. You can use this technique to find great opportunities based on nasa.gov, but unless you are SpaceX, you probably have no business doing so. Backlink Acquisition : Following the example above of a Raleigh, NC dentist, let’s assume that we selected the American Dental Association (ADA.org). Using Open Site Explorer , Majestic SEO , or A Hrefs , export all of the links pointing back to this site. This list of URLs should be treated in the same way as the list of URLs in the keyword method that were pulled from searching Google with prospecting phrases. You can now skip to the Link Extraction section in the previous description and follow from there. The steps are identical, no need to repeat them. Direct URL Targeting This is the least scalable of the strategies and is used specifically to target a single link prospect. Unlike the previous two methods where you are trying to find potential broken content to replace and your link prospects are those who link to that broken content, in this method you have already chosen your link prospect and you simply want to find broken links on his/her site as an excuse to start a conversation. I hesitate to include this strategy because it is weak and unscalable, but it is a part of the grouping of strategies known as “broken link building” so I will include it. Let’s assume that you are the Raleigh, NC dentist and you have decided that all you really want is a link from ADA.org. You feel that you have some great content they would link to if only you had a reason to open up a conversation that didn’t sound completely like begging. Well, the first step is to try and find a broken link on their site so you have a reason to reach out to their webmaster. Site Crawling : Site crawling can be problematic because you must balance your need for relatively quick responses and a general respect for the site owner’s bandwidth and uptime. Do not turn on a crawler that you are not certain follows polite crawling policies and obeys robots.txt. Your best bet would be one of the following: Xenu Link Sleuth A classic SEO tool, Xenu Link Sleuth makes it easy to spider a site and find broken links among other problems. Screaming Frog SEO Quickly becoming the spider of choice for many SEOs, Screaming Frog can quickly spider your site to diagnose everything from duplicate content to 404s. Deep Trawl Often overlooks, Deep Trawl is a worthy adversary for solving on-site issues.   Opportunity Selection : You now have a list of broken links on your ideal linking website. Identifying the best opportunity will greatly increase the likelihood of succeeding with this strategy. Here are a couple of pointers. Choose a broken link opportunity where the link is external . This does two things: it makes the webmaster feel like it is not his/her fault unlike an internal link and it creates a 1:1 ratio of removing an external link and hopefully adding your external link. A webmaster is far more likely to replace a broken external link with another external link than to replace an internal link with an external one. Try and choose a broken link on the same page as the one your link would most fit. This is most likely to occur if your ideal linking site has a resources section. Content Creation The next step in the broken link building process is creating content that matches or improves upon the broken page. The first step you will need to take is actually determining what the broken page is. We assume that you have already vetted this page for relevance so you should have a general idea, but getting as specific as possible will help you create content that meets the expectations of all of those who previously linked to the now defunct resource. There are two tools that can help with this right off the bat… Rebuilding Tools : Wayback Machine : The Wayback Machine at Archive.org allows you to see much of the web as it existed in history. This is your first and best bet for finding the content. Pro-tip: Use Majestic SEO’s historical index to find when the links were acquired, and then choose the date in Archive.org that corresponds with this date. This will help you know the mindset of the linkers if the content changed over time Warrick : Warrick is a little known tool by the Comp Sci department at Old Dominion that helps you rebuild an entire website by searching through public proxies/mirror caches to find copies of lost content. This is particularly good for rebuilding content that was blocQked by robots.txt. Unfortunately, Warrick is a perl program that may be difficult to operate.   Raised Expectations : Chances are the site for which you are replacing content has greater authority in the industry than does yours. Chances are it is less commercial, more informative, and more trustworthy in general. If you want to acquire a decent return on investment, you need to focus intently on content quality. Expect to improve upon the content that was created. Update relevant statistics. Add new citations and sections. Consider reaching out to the original author for more information to add credibility. Outreach So, you have found your opportunity, created your list of link opportunities, and you are ready to start outreach. Here is how to make the most out of that link list you have. Contact Finding : There are a growing number of resources for automating the process of contact discovery, although each comes with it’s own set of issues. CitationLabs Contact Finder Link Research Tools Contact Finder SEOGadget’s Contact Finder Raven Tools Contact Finder BuzzStream Virante’s Contact Finder: In Beta Email Templates : There are many strategies you can employ in the outreach, here are a few of them depending on how transparent you want to be. We find, in general, that if you write good enough content you can be very transparent. Act as a user who happened upon the broken link Mix your link in with other valuable, related links Offer the replacement in a follow up email Email Templates : Below is an example of a broken link building outreach email. The most important part of the outreach process is that you should tailor your outreach at least to the specific campaign and industry if not to each target specifically . If you can add even a sentence of plausible, relevant customization to each email you send out you will greatly increase your conversion. I promise you if you copy and paste this template you will waste a lot of your opportunities, no matter how good it is. SL: quick note – dead resource on your site Hello, I’m a licensed (industry specialist) and a health writer – I recently visited your site while researching for an article I’m working on… This is a note for your webmaster, as I found a dead resource on your site that visitors like me surely miss. It’s on this page: http://www.theirsite.gov/linksandresources I got an error message when I tried to click on this site: http://DeadURL.org/index.jsp It looks like they made a change to their home page but didn’t update it… anyhow, the correct link is here: http://www.FixedURL.org/ And while you’re updating your page, I wondered if you’d be open to including some further resources that could help people struggling with similar issues. Compelling Content Title http://www.clientsite.org/compellingcontent Compelling Content Title 2 http://www.clientsothersite.com/compellingcontent Thanks for your help and for providing great resources! Best, First Name Last Name Industry Credentials clientsite.org Anthony Nelson has some fantastic templates here from his excellent piece “Broken Link Building Guide from Noob to Novice”. Conclusions & Community Like nearly any link building technique, sweat equity is ultimately going to make the difference between a successful campaign and a failure. The devil is always in the details. With that, I would like to see that this becomes a living document. Broken link building, while not a new technique, is becoming more and more scalable. As more agencies, consultants and business owners jump on the bandwagon, their voices need to be heard as well. Subsequently, I am requesting that if you know any tips or tricks that you feel free to include them in the comments here. Thanks, and happy broken link building! Credit Where Due While I would like to pretend that most of my knowledge came from divine inspiration or on-the-job learning, the truth is that many thought leaders have chimed in on broken link building. This posting can be attributed in part to conversations with or content provided by the following great SEOs: Jon Cooper Garrett French Anthony Nelson Matt Zaffina Paddy Moogan   Sign up for The Moz Top 10 , a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

View article:
The Broken Link Building Bible

Find Your Site’s Biggest Technical Flaws in 60 Minutes

Posted by Dave Sottimano I’ve deliberately put myself in some hot water to demonstrate how I would do a technical SEO site audit in 1 hour to look for quick fixes, (and I’ve actually timed myself just to make it harder). For the pros out there, here’s a look into a fellow SEO ‘s workflow; for the aspiring, here’s a base set of checks you can do quickly. I’ve got some lovely volunteers who have kindly allowed me to audit their sites to show you what can be done in as little as 60 minutes. I’m specifically going to look for crawling, indexing and potential Panda  threatening  issues like: Architecture ( unnecessary  redirection, orphaned pages, nofollow) Indexing & Crawling (canonical, noindex, follow, nofollow, redirects, robots.txt, server errors) Duplicate content & On page SEO (repeated text, pagination, parameter based, dupe/missing titles, h1s, etc..) Don’t worry if you’re not technical, most of the tools and methods I’m going to use are very well documented around the web. Let’s meet our volunteers! http://cvcsports.com/ http://www.webrevolve.com http://www.lexingtonlaw.com/ Here’s what I’ll be using to do this job: SEOmoz toolbar – Make sure highlight nofollow links is turned on – so you can visibly diagnose crawl path restrictions Screaming Frog Crawler - Full website crawl with Screaming Frog (User agent set to Googlebot) – Full user guide here Chrome, and Firefox (FF will have Javascript, CSS disabled and User Agent as Googlebot) – To look for usability problems caused by CSS or Javascript Google search queries – to check the index for issues like content duplication, dupe subdomains, penalties etc.. Here are other checks I’ve done, but left out in the interest of keeping it short: Open Site Explorer  - Download a back link report to see if you’re missing out on links pointing to orphaned, 302 or incorrect URLs on your site. If you find people linking incorrectly, add some 301 rules on your site to harness that link juice http://www.tomanthony.co.uk/tools/bulk-http-header-compare/  - Check if the site is redirecting Googlebot specifically  http://spyonweb.com/  - Any other domains connected you should know about? Mainly for duplicate content http://builtwith.com/  - Find out if the site is using Apache, IIS, PHP and you’ll know which vulnerabilities to look for automatically Check for hidden text, CSS display:none funniness, robots.txt blocked external JS files, hacked / orphaned pages My essential reports before I dive in: Full website crawl with Screaming Frog (User agent set to Googlebot) A report of everything in Google’s index using the site: (1000 results per query unfortunately – this is how I do it ) Down to business… Architecture Issues 1) Important broken links We’ll always have broken links here and there, and in an ideal world they would all work. Just make sure for SEO & usability that important links (homepage) are always in good shape. The following broken link is on webrevolve homepage that should be pointing to their blog, but returns a 404. This is an important link because it’s a great feature and I definitely do want to read more of their content.     Fix: Get in there and point that link to the correct page which is http://www.webrevolve.com/our-blog/ How did I find it: Screaming Frog > response codes report 2) Unnecessary Redirection This happens a lot more than people like to believe. The problem is that when we 301 a page to a new home we often forget to correct the internal links pointing to the old page (the one with the 301 redirect).  This page http://www.lexingtonlaw.com/credit-education/foreclosure.html 301 redirects to  http://www.lexingtonlaw.com/credit-education/foreclosure-2.html However, they still have internal links pointing to the old page. http://www.lexingtonlaw.com/credit-education/bankruptcy.html?linkid=bankruptcy http://www.lexingtonlaw.com/blog/category/credit-repair/page/10 http://www.lexingtonlaw.com/credit-education/bankruptcy.html?select_state=1&linkid=selectstate http://www.lexingtonlaw.com/credit-education/collections.html Fix: Get in that CMS and change the internal links to point to http://www.lexingtonlaw.com/credit-education/foreclosure-2.html How did I find it: Screaming Frog > response codes report 3) Multiple subdomains – Canonicalizing the www or non-www version One of the first basic principles of SEO, and there are still tons of legacy sites that are tragically splitting their link authority by not using redirecting the www to non-www or vice versa. Sorry to pick on you CVSports :S http://cvcsports.com/ http://www.cvcsports.com/ Oh, and a couple more have got their way into Google’s index that you should remove too: http://smtp.cvcsports.com/ http://pop.cvcsports.com/ http://mx1.cvcsports.com/ http://ww.cvcsports.com/ http://www.buildyourjacket.com/ http://buildyourjacket.com/ Basically, you have 7 copies of your site in the index.. Fix: I recommend using www.cvcsports.com as the main page, and you should use your htaccess file to create 301 redirects for all of these subdomains to the main www site. How did I find it?  Google query “site:cvcsports.com -www” (I also set my results number to 100 for check through the index quicker) 4) Keeping URL structure consistent  It’s important to note that this only becomes a problem when external links are pointing to the wrong URLs. *Almost* every back link is precious, and we want to ensure that we get maximum value from each one. Except we can control how we get linked to; without www, with capitals, or trailing slashes for example. Short of contacting the webmaster to change it, we can always employ 301 redirects to harness as much value as possible.  The one place this shouldn’t happen is on your own site. We all know that www.example.com/CAPITALS is different to www.example.com/captials when it comes to external link juice. As good SEOs we typically combat human error by having permanent redirect rules to enforce only one version of a URL (ex. forcing lowercase), which may cause  unnecessary  redirects if someone links in contradiction to redirects. Here are some examples from our sites: http://www.lexingtonlaw.com/credit-education/rebuild-credit 301′s to trailing slash version http://webrevolve.com/web-design-development/conversion-rate-optimisation/ Redirects to the www version Fix: Determine your URL structure, should they all have trailing slashes, www, lowercase? Whatever you decide, be consistent and you can avoid future problems. Crawl your site, and fix these  Indexing & Crawling 1) Check for Penalties None of our volunteers have any immediately noticeable penalties, so we can just move on. This is a 2 second check that you must do before trying to nitpick at other issues. How did I do it? Google search queries for exact homepage URL and brand name. If it doesn’t show up, you’ll have to investigate further. 2) Canonical, noindex, follow, nofollow, robots.txt I always do this so I understand how clued up SEO-wise the developers are, and to gain more insight into the site. You wouldn’t check for these tags in detail unless you had just cause (ex. A page that should be ranking isn’t I’m going to combine this section as it requires much more than just a quick look, especially on bigger sites. First and foremost check robots.txt and look through some of the blocked directories, try and determine why they are being blocked and which bots they are blocking them from. Next, get Screaming Frog in the mix as it’s internal crawl report will automatically check each URL for Meta Data (noindex, header level nofollow & follow) and give you the canonical URL if there happens to be one. If you’re spot checking a site, the first thing you should do is understand what tags are in use and why they’re using them . Take Webrevolve for instance, they’ve chosen to NOINDEX,FOLLOW all of their blog author pages. http://www.webrevolve.com/author/tom/  http://www.webrevolve.com/author/paul/ This is a guess but I think these pages don’t provide much value, and are generally not worth seeing in search results. If these were valuable, traffic driving pages, I would suggest they remove NOINDEX but in this case I believe they’ve made the right choice. They also implement self-serving canonical tags (yes I just made that up), basically each page will have a canonical tag that points to itself. I generally have no problem with this practice as it usually makes it easier for developers. Example: http://www.webrevolve.com/our-work/websites/ecommerce/ 3) Number of pages VS Number of pages indexed by Google What we really want to know here is how many pages Google has indexed. There’s 2 ways of doing this, using Google Webmaster Tools by submitting a sitemap you’ll get stats back on how many URLs are actually in the index. OR you can do it without having access but it’s much less efficient. This is how I would check… Run a Screaming Frog Crawl (make sure you obey robots.txt) Do a site: query Get the *almost never accurate* results number and compare them to total pages in crawl If the numbers aren’t close, like CVCSports (206 pages vs 469 in the index) you probably want to look into it further.     I can tell you right now that CVCSports has 206 pages (not counting those that have been blocked by robots.txt). Just by doing this quickly I can tell there’s something funny going on and I need to look deeper. Just to cut to the chase, CVCsports has multiple copies of the domain on subdomains which is causing this. Fix: It varies. You could have complicated problems, or it might just be as easy as using canonical, noindex, or 301 redirects. Don’t be tempted to block the unwanted pages by robots.txt as this  will not remove pages from the index, and will only prevent these pages from being crawled. Duplicate Content & On Page SEO Google’s Panda update was definitely a game changer, and it caused massive losses for some sites. One of the easiest ways of avoiding at least part of Panda’s destructive path is to avoid all duplicate content on your site. 1) Parameter based duplication URL parameters like search= or keyword= often cause duplication unintentionally. Here’s some examples: http://www.lexingtonlaw.com/credit-repair-news/economic-and-credit-trends/mortgage-lenders-rejecting-more-applications.html http://www.lexingtonlaw.com/credit-repair-news/economic-and-credit-trends/mortgage-lenders-rejecting-more-applications.html?select_state=1&linkid=selectstate http://www.lexingtonlaw.com/credit-repair-news/credit-report-news/california-ruling-sets-off-credit-fraud-concerns.html http://www.lexingtonlaw.com/credit-repair-news/credit-report-news/california-ruling-sets-off-credit-fraud-concerns.html?select_state=1&linkid=selectstate http://www.lexingtonlaw.com/credit-repair-news/economic-and-credit-trends/one-third-dont-save-for-christmas.html http://www.lexingtonlaw.com/credit-repair-news/economic-and-credit-trends/one-third-dont-save-for-christmas.html?select_state=1&linkid=selectstate http://www.lexingtonlaw.com/credit-repair-news/economic-and-credit-trends/financial-issues-driving-many-families-to-double-triple-up.html http://www.lexingtonlaw.com/credit-repair-news/economic-and-credit-trends/financial-issues-driving-many-families-to-double-triple-up.html?select_state=1&linkid=selectstate Fix:  Again, it varies. If I was giving general advice I would say use clean links in the first place – depending on the complexity of the site you might consider 301s, canonical tags or even NOINDEX. Either way, just get rid of them ! How did I find it? Screaming Frog > Internal Crawl > Hash tag column Basically, Screaming Frog will create a unique hexadecimal number based on source code. If you have matching hash tags, you have duplicate source code (exact dupe content). Once you have your crawl ready, use excel to filter it out (complete instructions here). 2) Duplicate Text content Having the same text on multiple pages shouldn’t be a crime, but post Panda it’s better to avoid it completely. I hate to  disappoint  here, but there’s no exact science to finding duplicate text content. Sorry CVCSports, you’re up again ;) http://www.copyscape.com/?q=http%3A%2F%2Fwwww.cvcsports.com%2F Don’t worry, we’ve already addressed your issues above, just use 301 redirects to get rid of these copies Fix: Write unique content as much as possible. Or be cheap and stick it in an image, that works too.  How did I find it? I used  http://www.copyscape.com , but you can also copy & paste text into Google search 3) Duplication caused by pagination Page 1, Page 2, Page 3… You get the picture. Over time, sites can accumulate thousands if not millions of duplicate pages because of those nifty page links. I swear I’ve seen a site with 300 pages for one product page. Our examples: http://cvcsports.com/blog?page=1 http://cvcsports.com/blog?page=2 Are they being indexed? Yes. Another example? http://www.lexingtonlaw.com/blog/page/23 http://www.lexingtonlaw.com/blog/page/22 Are they being indexed? Yes. Fix: General advice is to use the NOINDEX, FOLLOW directive. (This tells Google not to add this page to the index, but crawl through the page). An alternative might be to use the canonical tag but this all depends on the reason why pagination exists. For example, if you had a story that was separated across 3 pages, you definitely would want them all indexed. However, these example pages are pretty thin and *could* be considered as low quality for Google. How did I find it? Screaming Frog > Internal links > Check for pagination parameters  Open up the pages and you’ll quickly determine if they are auto generated, thin pages. Once you know the pagination parameter or structure of the URL you can check Google’s index like so: site:example.com inurl:page= Time’s up! There’s so much more I wish I could do, but I was strict about the 1 hour time limit. A big thank you to the brave volunteers who put their sites forward for this post. There was one site that just didn’t make the cut, mainly because they’ve done a great job technically, and, um, I couldn’t find any technical faults. Now it’s time for the community to take some shots at me!  How did I do? What could I have done better?  Any super awesome tools I forgot? Any additional tips for the volunteer sites? Thanks for reading, you can reach me on Twitter @dsottimano if want to chat and share your secrets ;) Sign up for The Moz Top 10 , a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

See more here:
Find Your Site’s Biggest Technical Flaws in 60 Minutes