The importance of HTML in SEO

HTML code - hello world imageHTML is a ‘mark-up language’, used to describe the content and layout of all web pages to browsers and (importantly) search engines. HTML includes: the textual content for a web page; layout information (in combination with CSS); page title and descriptive meta tags; references to media (images, videos, etc); links to other pages; descriptions for media and links.

If you use a Content Management System or Blog software, you may be shielded from the underlying HTML by a WYSIWYG (what you see is what you get) editor, which gives you a cut down word processor-esque interface (bold, underline, styles, lists). Whilst this can be a handy time saver, it’s well worth having a look to see what’s under the bonnet, since not all CMS editors create valid, search engine friendly HTML.

Ultimately, the HTML that makes up a web page has to be able to tell a search engine what is on the page accurately and efficiently.

Spiders and HTML

Search engines send out spiders (sometimes called bots), which are automated tools used to crawl web pages. They read the html code for each web page they find and index it (save the important parts in a database) for inclusion within searches.

Valid HTML

HTML snippet missing closing tagHTML and XHTML (essentially a stricter version of HTML) are both subsets of XML (Extensible Markup Language), which use tags and attributes to encapsulate text and represent different parts of the web page. Tags are less than and greater than signs containing abbreviations (like p for paragraph or img for image), as seen in the image above. In general, tags have an opening and closing part. If these tags are not formed properly or missing a closing tag, spiders (and potentially browsers) find it difficult to read the page, so it pays to be careful with your coding.

Missing tags are only one of many potential pitfalls. Fortunately there are a few tools that can help. Rather than using a basic text editor like notepad to code HTML, it’s worth using a specialised code editor with built in validation, included in software like Dreamweaver or Visual Studio.Net that will highlight mistakes as you type. Note these particular tools also have WYSIWIG editors, so if you use these be sure to check the code view to make sure the HTML produced is valid – it’s not always!

And once you’ve published your website to a public facing URL, there’s a very thorough automated tool you can use to check your website’s HTML validity. The W3C organisation provide an HTML validator which allows you to enter your website’s URL and run a test. Any problems are listed and explained, you can correct and retest.

Search engines aren’t too worried if you choose to use HTML or XHTML, so long as you specify the correct ‘doctype’ at the top of your document and it validates correctly.

Bear in mind browsers can be quite forgiving, so don’t just assume search engines will be able to understand a page because it looks okay.


This is a point related to valid HTML, which although not strictly part of SEO, is well worth bearing in mind. Web accessibility is about allowing people with disabilities to use your website by building your pages in such a way that users don’t have to be able to see images/videos or hear audio to digest the content. Automated spiders are currently unable to interpret visual and audio content, so by gearing your content up to be accessible, you’re helping search engines as well as users with disabilities. Following these tips from the W3C web accessibility initiative guidelines will stand you in good stead.

  • Images & animations: Use the alt attribute to describe the function of each visual.
  • Image maps. Use the client-side map and text for hotspots.
  • Multimedia. Provide captioning and transcripts of audio, and descriptions of video.
  • Hypertext links. Use text that makes sense when read out of context. For example, avoid “click here.”
  • Page organization. Use headings, lists, and consistent structure. Use CSS for layout and style where possible.
  • Graphs & charts. Summarize or use the longdesc attribute.
  • Scripts, applets, & plug-ins. Provide alternative content in case active features are inaccessible or unsupported.
  • Frames. Use the noframes element and meaningful titles.
  • Tables. Make line-by-line reading sensible. Summarize.

You can also use WAVE’s web accessibility validation tool to help validate accessibility on your web pages and the view as text functionality is a good way of mimicking what a search engine spider can see.

Content behind forms

Any content that can only be accessed by submitting a form may not be able to be accessed by spiders. Quite simply, if content can’t be found, it won’t be indexed or ranked, so wherever possible, provide a standard anchor link to all pages on your website. This is a simplified tip and it’s worth putting a bit of thought into internal linking, as described more fully within the information architecture section of this website.

Tables vs CSS - less code requiredLess code, more content – utilising CSS

The higher your code to content ratio, the harder it makes it for search engines to find your important content and give it the importance it deserves. The old fashioned table based layouts used by web developers of yesteryear are less efficient (not to mention clumsier and more of a pain to maintain) than using CSS (cascading style sheets) to describe layout.

Using CSS means the layout descriptions can be kept outside of the page in linked files, making the code for each page simpler, which in turn makes it easier for search engines to find the content it’s looking for, rather than a lot of unnecessary html tags.

As a bonus, this separation of concerns makes pages easier to code, easier to re-purpose for different readers and reduces the page size, so it’s quicker for users to download and search engines to spider.

Heading Tags and Emphasis

Heading tags (h1, h2, etc) allow visitors and search engines to better understand how the content should be organised in your website and what the most important phrases are. As a rule h1 tags should be used for the main page title, with lower h tags containing subtitles in a logical hierarchy. Using meaningful wording is important, as search engines generally give these phrases greater relative importance than standard text on a page. The same is true for strong and emphasised text. This topic is explored in more detail in the page titles and meta tags section of this website.

Linking and anchor tags

Examples of good (descriptive) and bad (click here) link textAs mentioned in the list of accessibility tips above, when linking to pages with anchor tags, it is extremely important to use a descriptive phrase for the link text, rather than something arbitrary like ‘click here’. Search engines use this text to figure out what you’re linking to, so it’s imperative that this text describes the destination page. From an SEO point of view, this is particularly important when internally linking to pages within your website – you should use a keyword phrase that you’re trying to optimise the destination page for. This topic is explored in more detail in the information architecture section of this website.

There is some debate about whether the Title attribute in anchor tags is superflous – certainly there’s little value in duplicating the anchor text, because any visitor (human or spider) can already see this, but as a minimum it could be handy to add additional guidance for usability and so long as it doesn’t dilute the meaning, it could be used for additional keywords.

Anchor tags can have the attribute rel=”nofollow” specified, which instructs spiders not to follow these links. Again there is some debate about the usefulness of this. There are 2 main reasons you might want to do this: 1) If you’re linking to a web page that you don’t want to give any credit to. The generally accepted example is for comments on websites or blogs, where unscrupulous visitors may take the opportunity to add ‘link spam’, linking back to websites they’re trying to promote. 2) If you have summary pages or similar with duplicate content on your own website and you don’t want search engines (who generally don’t approve of duplicate content) to penalise you for this.

Images and alt tags

As mentioned in the accessibility tips list, you should always include ‘alt’ tags to describe your images. This is good practice for usability and accessibility, but also essential for SEO. Without these, search engines don’t know what’s in your image. When using images as links, this is especially important, as your image is acting as link text. It’s also worth naming your images descriptively (the actual file name) so they can be found by search engines and generate additional traffic back to your website.

Page titles and Meta tags

The page title (as seen in the top of your browser) and meta tags are important elements of the HTML page structure used to help describe your page contents to search engines. These are discussed in the page titles and meta tags pages within the content area of this website.


Javascript allows you to use a lot of very useful functionality, like AJAX (often used to make callbacks, cutting down on reloading pages, improving website performance), user friendly navigation and handsome looking animation. The current rate of browsers with javascript enabled is estimated at around 95%, so these techniques can and should be used to improve your website. But you must remember that most search engine spiders don’t execute javascript and won’t be able to get to content that can only be reached that way. The golden rule then, is to ensure that you don’t rely on javascript to allow users to access areas of your website. The easiest way to check this is to disable javascript in your browser and check you can still navigate around your website to all the content you want search engines to be able to find.

Google Rich Text Snippets

Google Rich Snippets for eventsGoogle displays ‘rich snippets’ for people, reviews, videos and events on its search results pages, displaying more detail about these specific items along with your listing. This can provide users with a better experience and potentially get you more exposure in organic search results pages, expanding your real estate from one link to several. This is achieved by adding some extra markup (spans and classes with specific names) to your HTML. Although the snippets are not guaranteed to be shown, it’s well worth adding this to your website, as a few relatively small changes to the HTML could have a lot of SEO potential. There are likely to be more announcements on new rich snippets formats from Google as time goes on.

Documentation on how to include the markup in your HTML using microformats or RDFa can be found in Google Answers covering reviewspeople and events. They also provide a handy testing tool.


  1. Dani Gorgon

    Thanks Sam Langdon, for this info well said openly for us.

    I, especially, liked the “CMS for SEO” and “Hosting” pages for they deliver a great amount of authentic knowledge for many of us like beginners who are in the middle of great ocean of web 2.0 technologies.

    Best of luck & wish you a joyous and prosperous new year.

    -Dani Gorgon

  2. corrupt

    I would appreciate more visual materials, to make your blog more attractive, but your writing style really compensates it. But there is always place for improvement

One Trackback/Pingback

  1. Quick Link Building Tips & « seourls

    […] Site architecture – make sure all pages on your website can be crawled and your html is well formed […]

Leave a Reply