The importance of HTML in SEO
HTML is a ‘mark-up language’, used to describe the content and layout of all web pages to browsers and (importantly) search engines. HTML includes: the textual content for a web page; layout information (in combination with CSS); page title and descriptive meta tags; references to media (images, videos, etc); links to other pages; descriptions for media and links.
If you use a Content Management System or Blog software, you may be shielded from the underlying HTML by a WYSIWYG (what you see is what you get) editor, which gives you a cut down word processor-esque interface (bold, underline, styles, lists). Whilst this can be a handy time saver, it’s well worth having a look to see what’s under the bonnet, since not all CMS editors create valid, search engine friendly HTML.
Ultimately, the HTML that makes up a web page has to be able to tell a search engine what is on the page accurately and efficiently.
Spiders and HTML
Search engines send out spiders (sometimes called bots), which are automated tools used to crawl web pages. They read the html code for each web page they find and index it (save the important parts in a database) for inclusion within searches.
HTML and XHTML (essentially a stricter version of HTML) are both subsets of XML (Extensible Markup Language), which use tags and attributes to encapsulate text and represent different parts of the web page. Tags are less than and greater than signs containing abbreviations (like p for paragraph or img for image), as seen in the image above. In general, tags have an opening and closing part. If these tags are not formed properly or missing a closing tag, spiders (and potentially browsers) find it difficult to read the page, so it pays to be careful with your coding.
Missing tags are only one of many potential pitfalls. Fortunately there are a few tools that can help. Rather than using a basic text editor like notepad to code HTML, it’s worth using a specialised code editor with built in validation, included in software like Dreamweaver or Visual Studio.Net that will highlight mistakes as you type. Note these particular tools also have WYSIWIG editors, so if you use these be sure to check the code view to make sure the HTML produced is valid – it’s not always!
And once you’ve published your website to a public facing URL, there’s a very thorough automated tool you can use to check your website’s HTML validity. The W3C organisation provide an HTML validator which allows you to enter your website’s URL and run a test. Any problems are listed and explained, you can correct and retest.
Search engines aren’t too worried if you choose to use HTML or XHTML, so long as you specify the correct ‘doctype’ at the top of your document and it validates correctly.
Bear in mind browsers can be quite forgiving, so don’t just assume search engines will be able to understand a page because it looks okay.
This is a point related to valid HTML, which although not strictly part of SEO, is well worth bearing in mind. Web accessibility is about allowing people with disabilities to use your website by building your pages in such a way that users don’t have to be able to see images/videos or hear audio to digest the content. Automated spiders are currently unable to interpret visual and audio content, so by gearing your content up to be accessible, you’re helping search engines as well as users with disabilities. Following these tips from the W3C web accessibility initiative guidelines will stand you in good stead.
- Images & animations: Use the alt attribute to describe the function of each visual.
- Image maps. Use the client-side map and text for hotspots.
- Multimedia. Provide captioning and transcripts of audio, and descriptions of video.
- Hypertext links. Use text that makes sense when read out of context. For example, avoid “click here.”
- Page organization. Use headings, lists, and consistent structure. Use CSS for layout and style where possible.
- Graphs & charts. Summarize or use the longdesc attribute.
- Scripts, applets, & plug-ins. Provide alternative content in case active features are inaccessible or unsupported.
- Frames. Use the noframes element and meaningful titles.
- Tables. Make line-by-line reading sensible. Summarize.
You can also use WAVE’s web accessibility validation tool to help validate accessibility on your web pages and the view as text functionality is a good way of mimicking what a search engine spider can see.
Content behind forms
Any content that can only be accessed by submitting a form may not be able to be accessed by spiders. Quite simply, if content can’t be found, it won’t be indexed or ranked, so wherever possible, provide a standard anchor link to all pages on your website. This is a simplified tip and it’s worth putting a bit of thought into internal linking, as described more fully within the information architecture section of this website.
Less code, more content – utilising CSS
The higher your code to content ratio, the harder it makes it for search engines to find your important content and give it the importance it deserves. The old fashioned table based layouts used by web developers of yesteryear are less efficient (not to mention clumsier and more of a pain to maintain) than using CSS (cascading style sheets) to describe layout.
Using CSS means the layout descriptions can be kept outside of the page in linked files, making the code for each page simpler, which in turn makes it easier for search engines to find the content it’s looking for, rather than a lot of unnecessary html tags.
As a bonus, this separation of concerns makes pages easier to code, easier to re-purpose for different readers and reduces the page size, so it’s quicker for users to download and search engines to spider.
Heading Tags and Emphasis
Heading tags (h1, h2, etc) allow visitors and search engines to better understand how the content should be organised in your website and what the most important phrases are. As a rule h1 tags should be used for the main page title, with lower h tags containing subtitles in a logical hierarchy. Using meaningful wording is important, as search engines generally give these phrases greater relative importance than standard text on a page. The same is true for strong and emphasised text. This topic is explored in more detail in the page titles and meta tags section of this website.
Linking and anchor tags
As mentioned in the list of accessibility tips above, when linking to pages with anchor tags, it is extremely important to use a descriptive phrase for the link text, rather than something arbitrary like ‘click here’. Search engines use this text to figure out what you’re linking to, so it’s imperative that this text describes the destination page. From an SEO point of view, this is particularly important when internally linking to pages within your website – you should use a keyword phrase that you’re trying to optimise the destination page for. This topic is explored in more detail in the information architecture section of this website.
There is some debate about whether the Title attribute in anchor tags is superflous – certainly there’s little value in duplicating the anchor text, because any visitor (human or spider) can already see this, but as a minimum it could be handy to add additional guidance for usability and so long as it doesn’t dilute the meaning, it could be used for additional keywords.
Anchor tags can have the attribute rel=”nofollow” specified, which instructs spiders not to follow these links. Again there is some debate about the usefulness of this. There are 2 main reasons you might want to do this: 1) If you’re linking to a web page that you don’t want to give any credit to. The generally accepted example is for comments on websites or blogs, where unscrupulous visitors may take the opportunity to add ‘link spam’, linking back to websites they’re trying to promote. 2) If you have summary pages or similar with duplicate content on your own website and you don’t want search engines (who generally don’t approve of duplicate content) to penalise you for this.
Images and alt tags
As mentioned in the accessibility tips list, you should always include ‘alt’ tags to describe your images. This is good practice for usability and accessibility, but also essential for SEO. Without these, search engines don’t know what’s in your image. When using images as links, this is especially important, as your image is acting as link text. It’s also worth naming your images descriptively (the actual file name) so they can be found by search engines and generate additional traffic back to your website.
Page titles and Meta tags
The page title (as seen in the top of your browser) and meta tags are important elements of the HTML page structure used to help describe your page contents to search engines. These are discussed in the page titles and meta tags pages within the content area of this website.
Google Rich Text Snippets
Google displays ‘rich snippets’ for people, reviews, videos and events on its search results pages, displaying more detail about these specific items along with your listing. This can provide users with a better experience and potentially get you more exposure in organic search results pages, expanding your real estate from one link to several. This is achieved by adding some extra markup (spans and classes with specific names) to your HTML. Although the snippets are not guaranteed to be shown, it’s well worth adding this to your website, as a few relatively small changes to the HTML could have a lot of SEO potential. There are likely to be more announcements on new rich snippets formats from Google as time goes on.