Why add a date to your content?
Everyone wants to see the most current content. Why should you want to date it if the server will automatically tell the browser that it’s all ‘brand new’ when serving an active page (i.e. PHP, ASP, JSP, etc.)?! By putting a date on your content you’re admitting that it’s not all sparkling new, but rather has been written some time ago. You would think that search engines will think that it’s worth less than normal active content without a date — but actually it’s the other way around.
Search engines know that active pages always return the current date. Keeping that in mind, they know they need to compare older versions of the same page to check to see if any changes have been done. By dating your content you are telling the search engine that you’ve done changes and it better start crawling the page and save the new content. Without a given date, the search engine has to decide on which changes are relevant (like when you change that "is not" to "is always"…) and which changes aren’t (like the date, current user listing, rotating banner, etc.). So, by dating your content, you’re actually helping search engines to keep your pages current in their indexes.
Content dating and Google Sitemaps
The standard XML-formatted Google Sitemap provides a tag for specify the last modified date of a URL ("lastmod"). This tag is optional, but if it is specified, it makes sense to also specify the date of the content itself (not the date the server returns with the active page).
Content dating and the SOFTplus GSiteCrawler
The SOFTplus GSiteCrawler will recognize certain meta-tags which the webmaster can use to specify the date of the content. These tags are (examples all for August 1st, 2005; in this order of precedence, i.e the later tags override the previous ones should multiple tage be specified in the document):
- date, e.g. <meta name="date" content="2005-08-01">
- dc.date, e.g. <meta name="dc.date" content="2005-08-01">
- dc.date.created, e.g. <meta name="dc.date.created" content="2005-08-01">
- dc.date.modified, e.g. <meta name="dc.date.modified" content="2005-08-01">
- dc.date.x-metadatalastmodified, e.g. <meta name="dc.date.x-metadatalastmodified" content="2005-08-01">
The date can be specified in different standards (schemes) and may include a time (with time-zone). The GSiteCrawler accepts the following ways of specifying the scheme:
- <meta name="[date-tag]" scheme="[SCHEME]" content="[date-time]">
- <meta name="[date-tag]" content="(scheme=[SCHEME])[date-time]">
The following schemes are supported by the GSiteCrawler (alle examples for 8AM GMT, August 1st, 2005):
- ISO8601, DCTERMS.W3CDTF, W3CDTF, ISO.31-1:1992 (default scheme if none specified)
Examples: "2005-08-01T08:00:00+00:00", "2005-08-01T08:00:00Z", without time: "2005-08-01" (assumes midnight GMT), without day: "2005-08" (assumes day 1), without month: "2005" (assumes month 1)
- RFC822, IETF.RFC822
Examples: "Mon, 01 Aug 2005 08:00:00 +0000", without time: "Mon, 01 Aug 2005"
- FGDC, ANSI.X3.30-1985
Some examples of complete tags:
- <meta name="dc.date" content="2005-08-01">
- <meta name="dc.date" scheme="ISO8601" content="2005-08-01T08:00:00+00:00">
- <meta name="dc.date" content="(scheme=ISO8601)2005-08">
- <meta name="dc.date" content="(scheme=IETF.RFC822)Mon, 01 Aug 2005">
How the GSiteCrawler determines the date of the current page
Your server will (almost) always return a last-modified date along with the content of your page. If the page also contains a valid date meta-tag, the GSiteCrawler will assume that the date in the meta-tag is the actual date of the content and use that for further processing (i.e. for generating the Google Sitemap file). A date meta-tag is valid if it can be parsed properly and if it results in a date older than "now" (current date + time) and newer then January 1st, 1990. If several date meta-tags are used, it will use the order of precedence listed above (e.g. if both ‘date’ and ‘dc.date.modified’ are specified and valid, it uses ‘dc.date.modified’).
Testing the GSiteCrawler with date meta-tags
To test the GSiteCrawler (or any other Google Sitemap Generator), feel free to use my test-set: