Time Zone Issues

Parent Previous Next

Accurately determining the publication date and time of feed articles can be fraught with danger. The problems trace back to two sources, the loose definition of date/time formats in early RSS feeds, and sloppiness and ignorance on the part of feed publishers.


RSS was a de facto standard before it was a formal one, and as such, many of its elements were only loosely specified, or not at all. Some conventions have prevailed, but even ones as basic as the text representation of dates and times are often abused or ignored. Originally, date/times in RSS feeds were supposed to adhere to the format specified in Internet RFC822 from 1982. However, "supposed to" was the operative phrase here since no formal specification for RSS existed and some publishers just used other date/time formats they liked better. Eventually a "pseudo" RFC822 came into common usage with RSS which mimicked, but didn't strictly adhere to the date format proscribed by the RFC. In particular, RFC822 says a date should look like this:


Mon, 15 Aug 05 15:52:01 GMT


Note that it begins with an abbreviated day of the week, which is optional, but usually included. Many date parsing functions expect it and don't interpret the date properly without it. Next is the day of the month then the abbreviated month, followed by the year. Notice that the year is only two digits, 1982 being long before Y2K worries were taken seriously. Next in the string is the time, in HH:MM:SS format. Finally a time zone or offset is appended to the end. RFC822 was an ARPA specification, therefore it only bothered to define time zone abbreviations from North America. Alternately the suffix could be a military jargon 1Alpha code where each hour offset from GMT is defined by a single capital letter. Finally, the offset from GMT could be specified using +/-HHMM, used widely today to specify date/times in email headers.


Not only was there huge room for variation in the format of an RFC822 compliant date and time, but the US-centric expectation that only North American time zone abbreviations would be used made the format extremely ambiguous when used elsewhere in the world. Many of the North American abbreviations are used by other nations for completely different time zones. For instance AST could mean Atlantic Standard Time in North America (GMT-0400), or Arabia Standard Time (GMT+0300). Even worse, CST could refer to Central Standard Time (North America, GMT-0600), China Standard Time (GMT+0800), Central Standard Time (Australia, GMT+0930), Central Summer Time (Australia, GMT+1030), or Cuba Standard Time (GMT-0500). Because of the extreme ambiguity of time zone abbreviations in a global scenario, RSS convention came to insist that publication dates always be specified as GMT. In fact the abbreviation duplication problem is so intractable that RFC822 date parsing functions in almost all popular programming libraries, such as Microsoft's .NET, will not accept a date that has a time zone other that GMT.


If all RSS publishers adhered to the convention and used only GMT date/times with an explicit day of week abbreviation at the beginning, all would be well. However, whether by ignorance or arrogance, a great many of them insist on using other time zone abbreviations, perhaps naively expecting all their readers to be local. Many of the largest North American and European publishers with impressive IT departments do this, eliminating inexperience as an excuse.


To overcome these issues, and others, in 2005 a working group published a proposed new standard for XML news feeds, and in 2007 it was adopted as RFC5023. Among other welcome improvements over RSS, Atom feeds would use RFC3339 to define date formats. An RFC3339 date looks like this:


1990-12-31T23:59:60Z


Note that the year is four digit, unlike RFC822. Also, the elements are ordered from least precise to most precise, making sorting trivial. The "Z" on the end is a special character that signifies the date/time as UTC (universal coordinated time, equivalent to GMT). If the Z is missing the date/time is considered to be a local time, and becomes ambiguous without the proper context. RFC3339 also allows explicit offsets from UTC in the form +/-HH:MM. The "T" between the date and time is a holdover from ISO8601, an earlier standard that RFC3339 refers to. In ISO8601 the "T" is required, but RFC3339 allows that position to be replaced with any character, and a space is often used to improve readability.


You would think that with such a robust specification, parsing date/times in Atom feeds would be easy, but once again publisher negligence complicates things. Some decide to use RFC822/email style offsets from UTC, that is +/-HHMM rather than +/-HH:MM, which again confuses standard library parsing functions. Occasionally even time zone abbreviations are appended, even though RFC3339 makes no allowance for them.


What this all means is that extracting date/times from RSS and Atom feeds is not an exact science, and although multiFEED attempts to decipher the publishers intent, sometimes it will make a mistake. If multiFEED is unable to parse the time zone or offset suffix correctly, it will ignore it completely and assume the rest of the characters describe a GMT/UTC date/time. This is probably not what you want. If you know publication date/times in a feed you follow with multiFEED are not being interpreted correctly, you should use the time zone override feature of multiFEED to explicitly tell it what UTC offset to use with the date/times.


NOTE: A high-profile Atom feed that illegally uses RFC822 RSS date formatting rather than RFC3339 Atom formatting has recently been seen in the wild. As of version 2.2.1 multiFEED will try to parse Atom feed dates as RFC3339, and if that fails it will attempt to read the date as RFC822 format instead.