SEO Case Study: Using Black Hat Resources for White Hat Gains

February 12, 2012 in Seo

I took a list of domains from a black hat SEO forum to see if I could find any quality backlinks acquired by using black hat automation methods for white hat SEO. During the curation of this list I additionally seek to gain a greater understanding of the rating guidelines used by Google’s new machine learning algorithm titled Panda. Which if you are not aware, has input from several quality raters – most significantly the head of Google’s web spam team Matt Cutts.

Several quality guidelines such as those outlined below were scaled and molded into an algorithm that Google runs – it has been estimated – every month to combat web spam. The easiest way to think the algorithm is like a anti-virus program that Google runs on occasion to combat misleading search results propagated by black hat SEOs – who primarily boost of irrelevant and advertising packed websites in order to make money via user actions, and views – PPC and PPV.

The list of domains was automatically generated using a program called XRumer which – among other functions – allows a user to “scrape” SERPs (Google search result pages) from website that meet user defined criterion. The domain list I stated working with was primarily Pligg based (Digg.com knock off script) and originally titled 800 social bookmarking sites. Below is a list of grading criterion I have used to determine if posting to one of the sites would have any value for my site forcing function.

Download the list of terms used in this post: [download id="2"]

How Google Ranks Websites

As many SEOs know Google uses two primary ranking factors to determine where websites’ sit in the results: Popularity and Relevance.

Ideally you would want to have both factors present when seeking a backlink from a domain. However both are not required. Look at adult websites. By nature they are against Google’s quality guidelines so often Google will penalize them from using adult/sexy related terms forcing them to rank poorly for relevant terms.

To combat the lack of innate relevance ratings in the search engines adult webmasters have used the factor of popularity to rank well. Several directory and news sites rank in a similar fashion since they aggregate content or have webmasters/SEOs post duplicate content the primarily rely on traffic and linking to rank well in Google.

I decided to apply the 80/20 principle to the list of domains to help narrow the sites I had to manually check. Even after applying these basic principles I still had to check 426 websites by hand.

Top Level Domains to Observe

The first factor I sorted by was TDL (Top Level Domain). While curating this list I checked every individual domain with the TDLs listed below to ensure my hypotheses were correct. In addition I provided a short justification as why I feel the quality of the content associated with the TDL is poor based upon my observations and experiences.

It’s also important to note that United States based users are less likely to click on any listing in the SERPS that is not .com .net or .org. Google has also stressed user experience above all with the new Panda Update and for the sake of keeping you back links indexed and passing link juice aim for sites that are beneficial for users.

It’s not always about the user however, sometimes you just need backlinks to help with SEO. Google does not imply heavy penalties on websites for having a few poor links, just keep the number of poor/junk backlinks low to avoid getting pigeonholed by Google as being associated with junk sites.

TDL Phrases Legend:

  • Typically Non Germanic characters” – I hypothesis that link weight will be lessen do to translation issues and user frustration with having to translate every page – high bounce rate for Germanic speaking cultures. Bounce rate is now a quality ranking factors used by Google an a consistently high bounce rate now lowers SERP rankings.
  • Temporary” – If the owner can’t spring for an higher priced TDL such as a .com or .net they usually don’t produce quality sites.
  • Personal” – Blogs often lack of focus and are convoluted with personal posts, therefore less relevant material to your niche is produced.
  • Case by Case” – I would recommend looking at the domain and running it past some of the other filters listed below before discrediting it.
  • Poor Regulation -Spammy” – A general catch all for TDLs that based on my observation produce poor spammy/duplicate content.

  • list-spammy-tdls

    The following section is dedicated to name specific factors. That is, various unfavorable factors in the domain name. For example: 123-poor-spammy-name.example.com

    Hyphenated domains \
    – Users are less likely to click because based on past experiences the domain is usually associated with spammy domains. I did not want to completely rule these out because sometimes site owners are merely unoriginal
    when it comes to selecting names for the their websites. On some occasions inexperienced SEOs many think hyphenated domains help with ranking so they opt for the hyphenated domain while still having a quality site. I urge you to look
    at the first few domain in a scrape list for positive trends/quality indicators before writing these domains off as poor quality.

    Subdomains and subdirectories \\
    pass similar link value however as Matt Cutts and other SEO professionals have mentioned in the past the further you get away from a domain name’s root the less link value you will have. Since I am seeking
    link value from the entire domain and not just the subdirectory I look at the root domain value of a site and not the page rank of the subdomain.

    example.com/directory-example/
    directory-example.example.com

    V.S

    example.com/directory
    directory.example.com

    The last two examples will pass more link value because they are closer to the domain root. However newer domains can benefit from the first two links because there is an extra keyword that they can use to rank for – helping with the
    relevance factor. If the domain already has a solid rank for the keyword, than a sorter URL is favorable.

    Misspelled, misleading, and gibberish domain names \

    These domain names have a poor user experience and generally are more likely to carry malaware/viruses I personally would never link to them

    I.E. altervista.org is a play off of the reputable altavista.com

    Adult and gambling keywords \

    If domains contain keywords associated with porn or gambling Google penalizes them because these types of sites are against Google’s user guidelines. I am posting these in an image so that I do not see my site get penalized.

    list-terms-against-googles-guide-lines-thumb

    Spammy and Generic Terms \\

    These sites are usually made to boost rankings for highly competitive keywords. If you are trying to market in one of these categories there are several safer Grey hat techniques I would use in lieu of being associated with someone else’s
    site. If you are to ever use Grey hat techniques you would want full control over them. If you receive a backlink from a spammy site with these terms and then a virus is uploaded to one of these sites you are now associated will a
    malaware network and your ranking will suffer. Keep a close eye on these terms and get backlinks from these sites sparingly, if you want to remain within Google’s guidelines, if not – submit away! (Terms put in image to avoid keyword
    stuffing)

    list-competitve-terms-thumb

    Case by case basis terms \\

    article
    content
    article
    submit

    Any Popular tech brand or trend – android, mac, ipod, ipad ect
    Anything Popular or recent in the news
    Numbers – Usually Spammy – Unless it’s a cleaver name or has alot of traffic I usually delete

    On a domain if the alexa backlinks are low or the popularity is below 500,000 I will not use. Remember these are still
    spammy sites, only adding them for seo purposes

    Does the domain have a keyword in the neighborhood of terms that you are trying to rank for – might want to use it regardless of rating

    Does the root page have a page rank – a quality indicator
    Are their backlinks in Google
    Does it rank number 1 in Google for it’s own domain name – I.E. does example.com show up at the top for example.com searches – if not it is getting penalized by Google

    Personally I like to visit every site I am going to use as a backlink at that I don’t see terms I absolutely do not
    want to be associated with such as anything in the adult industry

    Domain name age – has it been doing what it has been doing for several years without being penalized

    On site Visual Ques \

    Do the story titles appear to be spammy/promoting sites that clearly submit for a back link and not to help the user
    DO the stories have more than just a few votes – indicates traffic and user interaction with site
    Is the page written in Russian or an Arabic language – usually poorly regulated sites – spammy
    Does the site have alot of ads
    Does the site have a pop up
    Does the site have anything annoying to distract the viewer – for more strict guidelines
    Does the site have enough content at first glance – effects bounce rate which is now important post panda
    Does the site allow automated/spammy commenting –
    Does the site have one or more redirects to another page or domain
    Does the site have a malaware warning
    Does the every story have the same amount of votes
    Is the layout user friendly
    Does the page take a long time to load – could be stuffing your browser with cookies

    Site SEO Factors \\

    Does the site use canonical tags or 301 redirects for “www.” or “non www.”
    Do they use https – if they sprung to secure the domain it has to have some innate qualities

    0 responses to SEO Case Study: Using Black Hat Resources for White Hat Gains

    Leave a reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

    1 trackback

    • Quora
      on April 27, 2012