Home / Tutorial / Google / Visibility – Search Engine Optimization

Visibility – Search Engine Optimization

<>a>h1>Visibility – Search Engine Optimization

Why Search Engines?

  • There are too many pages:
    • Surface Web (public pages)
    • Deep Web (dynamic, scripted, non-HTML, unlinked, private, contextual, limited access)
  • These pages are not randomly spreaded but interconnected.
  • The Surface Web contains pages indexed by the standard search engines.
  • The deep Web is several orders of magnitude larger than the surface Web.
  • How many pages are indexed? 124 billions – Cuil (July 15, 2009)

Ecology of the Web

  • Small World Theory
    • Human linkage: In 1968, sociologist Stanley Milgram invented small-world theory for social networks by noting that every human was separated from any other human by only six degrees of separation.
    • Web page linkage: Every Web page is thought to be separated from any other Web page by an average of about 19 clicks.
  • Bow Tie Theory – Research conducted jointly by scientists at IBM, Compaq, and AltaVista used a Web crawler to identify 200 million Web pages and follow 1.5 billion links on these pages and found out that the Web was not like a spider web at all, but rather like a bow tie:
    • Strongly connected component (SCC): 56 million Web pages; the core.
    • OUT pages: 44 million on the right side of the bow tie; you could get to from the center, but could not return to the center from; tended to be corporate intranets
    • IN pages: 44 million pages on the left side of the bow tie; from which you could get to the center, but  you could not travel to from the center; recently created pages that had not yet been linked to by many center pages
    • Tendril pages: 43 million pages in the “tube” that did not link to the center and could not be linked to the center.
    • Totally lost pages: 16 million pages totally disconnected from everything
bow-tie
Source: K. Laudon & C. Trever, E-Commerce 2009 (5th Edition), Prentice Hall.

 

  • Super Node Theory
    • Research performed by Albert-Lazlo Barabasi at the University of Notre Dame.
    • Through exponentially exploding network of 50 billion Web pages, activity on the Web was actually highly concentrated in “very-connected super nodes” that provided the connectivity to less well-connected nodes.
    • Barabasi dubbed this type of network a “scale-free” network which is highly vulnerable to destruction: Destroy their super nodes and transmission of messages breaks down rapidly.
  • Marketing Implications
    • Search engines cannot find a Web site if it is not well-connected or linked to the central core of the Web.
    • Because e-commerce revenues in part depend on customers being able to find a Web site using search engines, Web site managers need to take steps to ensure their Web pages are part of the connected central core, or “super nodes” of the Web.
    • One way to do this is to make sure the site has as many links as possible to and from other relevant sites, especially to other sites within the SCC.

Top Search Engines – by Volumn

2010 Google Yahoo! Bing Ask Total
2010-04-10 63.01% 11.08% 9.70% 2.69% 86.48%
2010-03-06 71.07% 14.46% 9.55% 3.01% 98.09%
2010-02-06 71.35% 14.60% 9.56% 2.55% 98.06%
2010 01 71.61% 14.76% 9.13% 2.66% 98.16%
(source: www.hitwise.com)

 

  • Yahoo! 
    • Stands for: “Yet Another Hierarchical Officious Oracle”
    • Founded in 1994 by David Filo and Jerry Yang
    • Was originally a collection of Web sites organized by categories
  • Google
    • Founded in 1997 by Larry Page and Sergey Brin
    • Uses Latent Semantics Indexing (LSI) to solve the synonymy and polysemy problems in automatic information retrieval
    • Uses patented PageRank System to calculate an index of popularity
    • Runs on an estimated 450,000-rack servers tied up in thousands of clusters in dozens of data centers around the world.

Search Engine Functionalities

  • Crawling the Web
  • Indexing the pages
  • Calculating popularity scores
  • Caching pages
  • Processing search queries
  • Providing search results

Information Collected by Web Spider

  • Header section:  <head>
  • Page title: <title></title>
  • Keyword and description tags: <meta name=”Keywords” content=”…”>
  • Robots instructions: <meta name=”Robots” content=”…”>
  • Content section
  • Headings and subheadings:  <h1><h2>…<h6>
  • Keywords and phrases noted in the keywords and description meta tags
  • Image alt attribute: <img src=”” alt=”…”>
  • Text links: <a href=”…”>
  • Text link title attribute: <a href=”…” title=”…”>

Google Search Engine

  • Set to avoid keyword stuffers and spammers
  • Context-based rather than merely keyword-based
  • Use more than 200 signals to examine the entire link structure of the web and determine which pages are most important
  • Use proprietary technologies to determine page importance
    • Latent Semantic Indexing (LSI)
      • A technique used in Natural Language Processing (NLP)
      • Attempts to solve the synonymy and polysemy problems in automatic information retrieval
      • Use vector (high dimensional) and matrix (multiple iterations) processing
      • Provide contextual query search (semantic search) instead of a direct keyword match
    • PageRank (PR)
      • Named after Larry Page, founder of Google
      • Developed by Larry Page and Sergey Brinn on a Stanford University project
      • Traditional library model: The more citations other documents make to a particular document, the more “important” the document is, the higher its rank in the system, and the more likely it is to be retrieved first.
      • PageRank model:
        • A probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page.
        • Correlates well with human concepts of importance, because it derives from human-generated links
        • A PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PR.
        • Computes a recursive score (iterative) for web pages, based on the weighted sum of the PRs of the pages linking to them through the link structure
        • A vote of support by other web pages about how important a page is
        • When a user enters a query into Google, the results are returned in the order of their PR.
        • A page is more important if there are more pages linking to it or pages linking to it has higher PR.
        • Calculating PR: www.webworkshop.net, Google Toolbar
        • more on PageRank
  • Use hypertext-matching analysis to determine which pages are relevant to the specific search being conducted
  • Use other secret criteria
  • Striving for instant search speed (August, 2009)

Search Engine Optimization

Search Engine Optimization (SEO) is the process of improving the traffic to a web site from search engines via organic search results and tweaking a website to appear among the top listings on SERPs.

  • Two SEO techniques
    • White Hat SEO
      • Techniques that search engines recommend as part of good design
      • Conforms to the guidelines of search engines
      • Tend to produce results that last a long time
      • 3 focus areas:
        • On-page optimization
          • keyword optimizations web pages
          • Very important and easy to control
        • On-site optimization
          • The navigation and linking structure of the site
          • Very important and easy to control
        • Off-site optimization
          • Promotion
          • Backlinks
          • Extremely important but very difficult to control
    • Black Hat SEO
      • Techniques that search engines do not approve
      • May be effective in the short run
      • May eventually be reduced in PR or be banned either temporarily or permanently automatically by the search engines’ algorithms or by a manual site review

Google’s SEO Starter Guide

Google’s SEO Starter Guide is a white hat guide developed from Google’s best practices for title tags, meta tags, URL structure, navigation, content, anchor text, headers, images and of course, Robots.txt, etc. As Google claims, following the best practices outlined in the guide will make it easier for search engines to both crawl and index a website content. However, the guidelines doesn’t release any of Google’s algorithm secrets, does not necessarily yield what’s best for Internet users, and does not leave much room for innovation around Web site design (e.g., Flash, Flex. Etc.).

  • <title>
    • Reveals what the topic of a particular page is
    • Ideally a unique title for each page
    • Shown in the snippet of the SERP
    • Homepage: list the name of the business and other important information like the physical location of the business or its main focuses or offerings
    • Deeper pages: accurately describe the focus of that particular page and include the business name.
    • Good practices:
      • Accurately describe the page’s content
      • Create unique title tags for each page
      • Use brief, but descriptive titles
    • Avoid:
      • Choosing a title that has no relation to the content on the page
      • Using default or vague titles like “Untitled” or “New Page 1“
      • Using a single title tag across all of your site’s pages or a large group of pages
      • Using extremely lengthy titles that are unhelpful to users
      • Stuffing unneeded keywords in your title tags
  • <meta>
    • A summary of what the page is about
    • A sentence or two or a short paragraph
    • Might be used as snippets or in the Open Directory Project
    • Add it to each page
    • Good practices:
      • Accurately summarize the page’s content
      • Use unique descriptions for each page
    • Avoid:
      • Writing a description meta tag that is not relevant to the content on the page
      • Using generic descriptions like “This is a webpage” or “Page about baseball cards“
      • Filling the description with only keywords
      • Copy and pasting the entire content of the document into the description meta tag
      • Using a single description meta tag across all of your site’s pages or a large group of pages
  • URL Structure
    • Creating descriptive categories and filenames for better crawling and indexing
    • Making backlinks clear and easy to remember
    • Dynamic URLs: better than static ones, but if done incorrectly, could cause crawling issues
    • Good practices:
      • Use meaningful words in URLs
      • Create a simple directory structure
      • Provide one version of a URL to reach a document
    • Avoid:
      • Using lengthy URLs with unnecessary parameters and session IDs
      • Choosing generic page names like “page1.html“
      • Using excessive keywords like “baseball-cards-baseball-cards-baseballcards.htm”
      • Having deep nesting of subdirectories like “…/dir1/dir2/dir3/dir4/dir5/dir6/page.html”
      • Using directory names that have no relation to the content in them
      • Having pages from subdomains and the root directory (e.g. “domain.com/page.htm” and “sub.domain.com/page.htm”) access the same content
      • Mixing www. and non-www. versions of URLs in your internal linking structure
      • Using odd capitalization of URLs (many users expect lower-case URLs and remember them better)
  • Navigation
    • All sites should have a home or “root” page
    • Sitemap page: a simple page on your site that displays the hierarchical structure of the website to guide viewers and search engine
    • Submit XML Sitemap: can be submitted through Google’s Webmaster Tools to help Google navigate the site; also one way (though not guaranteed) to tell Google which version of a URL you’d prefer as the canonical one (e.g. http://brandonsbaseballcards.com/ or http://www.brandonsbaseballcards.com)
    • Google Sitemap Generator
    • Good practices:
      • Create a naturally flowing hierarch
      • Create a simple directory structure
      • Use mostly text for navigation
      • Use “breadcrumb” navigation (a row of internal links at the top or bottom of the page that allows visitors to quickly navigate back to a previous section or the root page)
      • Put an HTML sitemap page on the site, and submit an XML Sitemap file to Google
      • Consider what happens when a user drops off a part of the URL in the hopes of finding more general content; use 404 redirect page
    • Avoid:
      • Creating complex webs of navigation links, e.g. linking every page on the site to every other page
      • Going overboard with slicing and dicing the content
      • Having a navigation based entirely on drop-down menus, images, or animations (Google likes text.)
      • Letting the HTML sitemap page become out of date with broken links
      • Creating an HTML sitemap that simply lists pages without organizing them
      • Allowing the 404 pages to be indexed in search engines (make sure that the web server is configured to give a 404 HTTP status code when non-existent pages are requested)
      • Using a design for your 404 pages that isn’t consistent with the rest of the site
  • <a>
    • Tells Google something about the page is linking to
    • The better the anchor text is, the easier it is for Google to understand.
    • Good practices:
      • Write short but descriptive text
      • Format links so they’re easy to spot between regular text and the anchor text
    • Avoid:
      • Writing generic anchor text like “page”, “article”, or “click here“
      • Using text that is off-topic or has no relation to the content of the page linked to
      • Using the page’s URL as the anchor text in most cases
      • Writing long anchor text, such as a lengthy sentence or short paragraph of texts
      • Using CSS or text styling that make links look just like regular texts
      • Using excessively keyword-filled or lengthy anchor text just for search engines
      • Creating unnecessary links that don’t help with the user’s navigation of the site
  • <img>
    • The filename and contents of the alt attribute provide information about the picture
    • When image is used as a link, the alt text for that image will be treated similarly to the anchor text of a text link
    • Optimizing the image filenames and alt text makes it easier for image search projects like Google Image Search to better understand the images.
    • Good practices:
      • Use brief, but descriptive filenames and alt text
      • Supply alt text when using images as links
      • Store images in a directory of their own
      • Use commonly supported filetypes
    • Avoid:
      • Using generic filenames like “image1.jpg”, “pic.gif”, “1.jpg”
      • Writing extremely lengthy filenames
      • Stuffing keywords into alt text or copying and pasting entire sentences
      • Writing excessively long alt text that would be considered spammy
      • Using only image links for your site’s navigation
  • Quality Content and Services
    • Good practices:
      • Write easy-to-read text
      • Stay organized around the topic
      • Use relevant language (use Google keyword tool) https://adwords.google.com/select/KeywordToolExternal
      • Create fresh, unique content
      • Offer exclusive content or services
      • Create content primarily for your users, not search engines
    • Avoid:
      • Writing sloppy text with many spelling and grammatical mistakes
      • Embedding text in images for textual content (users may want to copy and paste the text and search engines can’t read it)
      • Dumping large amounts of text on varying topics onto a page without paragraph, subheading, or layout separation
      • Rehashing existing content that will bring little extra value to users
      • Having duplicate or near-duplicate versions of your content across the site (Google is fine with “regular” and “printer” version of each, but will penalize sites with intentionally created duplicated content.)
      • Inserting numerous unnecessary keywords aimed at search engines but are annoying or nonsensical to users
      • Deceptively hiding text from users, but displaying it to search engines
  • robot.txt
    • Tells search engines whether they can access and crawl parts of your site
    • Placed in the root directory of your site
    • Use the robots.txt generator in Google Webmaster Tools create this file
    • Good practices:
      • Use more secure methods for sensitive content. (One reason is that search engines could still reference the URLs you block if there happen to be links to those URLs somewhere on the Internet. Also, search engines that don’t acknowledge the Robots Exclusion Standard could disobey the instructions of your robots.txt. Finally, a curious user could examine the directories or subdirectories in your robots.txt file and guess the URL of the content that you have.
    • Avoid:
      • Allowing search result-like pages to be crawled
      • Allowing a large number of auto-generated pages with the same or only slightly different content to be crawled
      • Allowing URLs created as a result of proxy services to be crawl
  • rel=”nofollow”
    • Use rel=”nofollow“ in the <a> tag to tell Google that certain links on your site shouldn’t be followed (If a site has a blog with public commenting turned on, links within those comments could pass the site’s reputation to the linked.)
    • Use content=”nofollow“ name=”robots“ in a <meta> tag to tell Google not to follow all of the links on a page
    • Good practices:
      • Blog about new content or services
      • Don’t forget about offline promotion (e.g., business card, newsletter, etc.)
      • Know about social media sites
      • Join Google’s Local Business Center
      • Reach out to those in your site’s related community
    • Avoid:
      • Attempting to promote each new, small piece of content; go for big, interesting items
      • Involving the site in schemes where the content is artificially promoted to the top of these services
      • Spamming link requests out to all sites related to the topic area
      • Purchasing links from another site with the aim of getting PageRank instead of traffic
  • Off-site Optimization
    • Good practices:
      • Blog about new content or services
      • Don’t forget about offline promotion (e.g., business card, newsletter, etc.)
      • Know about social media sites
      • Join Google’s Local Business Center
      • Reach out to those in your site’s related community
    • Avoid:
      • Attempting to promote each new, small piece of content; go for big, interesting itemsInvolving the site in schemes where the content is artificially promoted to the top of these services
      • Spamming link requests out to all sites related to the topic area
      • Purchasing links from another site with the aim of getting PageRank instead of traffic
    • XML sitemap
      • Create Google Webmasters Tools account and submit Sitemap from time to time
      • Yahoo! & MSN search engines now use the same XML format
      • Creating sitemap: Google Webmaster Tools, xml-sitemaps.com
    • RSS Feeds
      • An acronym for Really Simple Syndication or Rich Site Summary
      • RSS (noun) – an XML format for distributing updated information on the Web
      • Update multiple RSS online directories
      • Perform bulk updates (e.g., Pingomatic XML-RPC call)
    • DMOZ 
      • Google takes URL-s from DMOZ, matches URL-s with their PageRank number on Google and copies into Google Directory
      • The Internet’s largest directory system containing over 4 million web sites
      • Maintained by human editors
      • DMOZ submission

About yanzen

Yansen.asia adalah Blog Pembelajar yang dikelola oleh koko Yansen Alexander. Blog tentang Informasi Teknologi, Ilmu Seo, Internet Marketing dan juga Desain Grafis Freelance dan juga membuka kursus seo dan adsense online untuk pemula/ukm/perusahaan (-corporate). HP :+62-823-9985-1124