Knotty problems in date/time parsing and formatting and time zones

28
© 2008 IBM Corporation Knotty problems in date/time parsing and formatting and time zones Yoshito Umaoka IBM Globalization Center of Competency 32nd Internationalization and Unicode Conference

description

Knotty problems in date/time parsing and formatting and time zones. Yoshito Umaoka IBM Globalization Center of Competency. 32nd Internationalization and Unicode Conference. Agenda. Challenges for Implementing Date and Time UI Understanding Time Zone Formatting Parsing. - PowerPoint PPT Presentation

Transcript of Knotty problems in date/time parsing and formatting and time zones

Page 1: Knotty problems in date/time parsing and formatting and time zones

© 2008 IBM Corporation

Knotty problems in date/time parsing and formatting and time zones

Yoshito UmaokaIBM Globalization Center of Competency

32nd Internationalization and Unicode Conference

Page 2: Knotty problems in date/time parsing and formatting and time zones

2 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Agenda

Challenges for Implementing Date and Time UI Understanding Time Zone Formatting Parsing

Page 3: Knotty problems in date/time parsing and formatting and time zones

3 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Challenges for Implementing Date and Time UI

Two examples– Google Calendar

– IBM Lotus Notes

Walking through various requirements for displaying date and time Solutions provided by CLDR Design/Implementation Tips

Page 4: Knotty problems in date/time parsing and formatting and time zones

4 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Google Calendar

Page 5: Knotty problems in date/time parsing and formatting and time zones

5 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Lotus Notes 8 Calendar

Page 6: Knotty problems in date/time parsing and formatting and time zones

6 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Date Format Types

Basic: July 27, 2008Relative: Today

Basic: July 28, 2008Relative: Tomorrow

Basic: August 3, 2008Relative: August 3, 2008

Interval: July 27 - 28, 2008Duration: 1 day

Interval: July 27 – August 3, 2008Duration: 7 days

Page 7: Knotty problems in date/time parsing and formatting and time zones

7 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Mini Calendar

Month– Different form without date in some locales

– Eg. Polish - lipiec (nominative) vs. lipca (genitive)

– lipiec 2008

– 28 lipca 2008 Day of week

– Very short abbreviation

– Not always the first letter of day of week name

– Eg. Chinese: 星期日 ⇒ 日 The first day of week

– Sunday is the first day of week in many regions, but it’s not true in some regions.

Page 8: Knotty problems in date/time parsing and formatting and time zones

8 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Month/Day of Week Names in CLDR

3 different widths - wide / abbreviated / narrow 2 context types – format / stand-alone

Locale format stand-alone

wide abbreviated narrow wide abbreviated narrow

en_US January Jan J January Jan J

pl_PL stycznia sty s styczeń sty s

ru_RU января янв. Я Январь янв. Я

Locale format stand-alone

wide abbreviated narrow wide abbreviated narrow

en_US Sunday Sun S Sunday Sun S

zh_Hans_CN 星期日 周日 日 星期日 周日 日

Month name example - January

Day of week name example - Sunday

Page 9: Knotty problems in date/time parsing and formatting and time zones

9 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Date and Time Interval

When displaying a date interval, duplicated date fields could be stripped off.

– 3 possible patterns depending on combination of start date and end date

– July 20–26, 2008

– July 20 – August 1, 2008

– July 20, 2008 – July 19, 2009

– Different combination patterns in different locales

– 20–26 July 2008

– 20 July – 1 August 2008

– 20 July 2008 – 19 July 2009

Page 10: Knotty problems in date/time parsing and formatting and time zones

10 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Date/Time Interval in CLDR

<intervalFormatItem id="yMMMd"> <greatestDifference id="y">MMM d, yyyy – MMM d, yyyy</greatestDifference> <greatestDifference id="M">MMM d – MMM d, yyyy</greatestDifference> <greatestDifference id="d">MMM d–d, yyyy</greatestDifference></intervalFormatItem>

Each <intervalFormatItem> is associated with as “skeleton” pattern and contains one or more patterns

A <greatestDifference> element contains a pattern which will be used when the greatest difference of two given dates matches its “id” attribute

Page 11: Knotty problems in date/time parsing and formatting and time zones

11 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Other Challenges

Various combinations of date fields and widths– “Sat 7/26”

– The UI requires to display short format including month, day of month and day of week, but not year

– The pattern could be changed depending on the locale

– “Sat 26/7” for en_GB

– “7/26( 土 )” for ja_JP Week number

– Week number is commonly used in European countries

– The way of calculating week numbers in a year may vary depending on local conventions

Page 12: Knotty problems in date/time parsing and formatting and time zones

12 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Flexible Date Format Support in CLDR (1)

<availableFormats> contains various <dateFormatItem> Each <dateFormatItem> has id attribute representing “skeleton” “skeleton” contains only field information in a canonical order A CLDR consumer provides a “skeleton” – When the matching

“skeleton” is available in the locale, the associated pattern is returned. If not, closest match which contains all requested fields is returned.

<availableFormats> <dateFormatItem id="MMMEd" draft="provisional">E d MMM</dateFormatItem> <dateFormatItem id="MMMMd" draft="provisional">d MMMM</dateFormatItem> <dateFormatItem id="MMdd" draft="provisional">dd/MM</dateFormatItem> <dateFormatItem id="Md" draft="provisional">d/M</dateFormatItem> <dateFormatItem id="yyMMM" draft="provisional">MMM yy</dateFormatItem> <dateFormatItem id="yyyyMM" draft="provisional">MM/yyyy</dateFormatItem> <dateFormatItem id="yyyyMMMM" draft="provisional">MMMM yyyy</dateFormatItem></availableFormats>

Page 13: Knotty problems in date/time parsing and formatting and time zones

13 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Flexible Date Format Support in CLDR (2)

When any <dateFormatItem> element does not satisfy the matching criteria, use the rules defined by <appendItems> to append missing fields to one of the existing format.

<appendItems> <appendItem request="Day">{0} ({2}: {1})</appendItem> <appendItem request="Day-Of-Week">{0} {1}</appendItem> <appendItem request="Era">{0} {1}</appendItem> <appendItem request="Hour">{0} ({2}: {1})</appendItem> <appendItem request="Minute">{0} ({2}: {1})</appendItem> <appendItem request="Month">{0} ({2}: {1})</appendItem> <appendItem request="Quarter">{0} ({2}: {1})</appendItem> <appendItem request="Second">{0} ({2}: {1})</appendItem> <appendItem request="Timezone">{0} {1}</appendItem> <appendItem request="Week">{0} ({2}: {1})</appendItem> <appendItem request="Year">{0} {1}</appendItem></appendItems>

Page 14: Knotty problems in date/time parsing and formatting and time zones

14 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Week Data in CLDR

<weekData>– minDays: minimum days in the first week

– firstDay: first day in a week

– weekendStart/weekendEnd: start/end day of weekend

<weekData> <minDays count="1" territories="001" /> <minDays count="4" territories="AT BE CA CH DE DK FI FR IT LI LT LU MC MT NL NO SE SK" /> <minDays count="4" territories="CD" draft="true" />

<firstDay day="mon" territories="001" /> <firstDay day="fri" territories="MV" /> <firstDay day="sat" territories="AE AF BH DJ DZ EG ER ET IQ IR JO KE KW LB LY MA OM QA SA SD SO TN YE" /> <firstDay day="sun" territories="AS AU AZ BW CA CN FO GE GL GU HK IE IL IS JM JP KG KR LA MH MN MO MP MT NZ PH PK SG TH TT TW UM US UZ VI ZA ZW" /> <firstDay day="sun" territories="ET MW NG TJ" draft="true" /> <firstDay day="sun" territories="GB" draft="true" alt="variant" references="Shorter Oxford Dictionary (5th edition, 2002)"/> <firstDay day="thu" territories="SY" />

<weekendStart day="sat" territories="001"/> <weekendStart day="fri" territories="EG IL SY"/> <weekendStart day="sun" territories="IN"/> <weekendStart day="thu" territories="AE BH DZ IQ JO KW LB LY MA OM QA SA SD TN YE AF IR"/>

<weekendEnd day="sun" territories="001"/> <weekendEnd day="fri" territories="AE BH DZ IQ JO KW LB LY MA OM QA SA SD TN YE AF IR"/> <weekendEnd day="sat" territories="EG IL SY"/></weekData>

Page 15: Knotty problems in date/time parsing and formatting and time zones

15 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Comparison of Format Functions

Standard C library Microsoft .NET JDK ICU

Basic format function strftime/wcsftime DateTime SimpleDateFormat SimpleDateFormat

Predefined format patterns

LC_TIMEdatetimedate & time

DateTimeFormatInfodate (long/short)time (long/short)date & timemonth and dayyear and month

DateFormat constructordatetimedate & time4 different lengths for above (full/long/medium/short)

DateFormat constructorSame with JDKSupport for arbitrary combination of date fields using “skeleton” pattern

Localized month/day names

LC_TIMEfull & abbreviated

DateTimeformatInfofull & abbreviatedgenitive monthshortest day names

DateFormatSymbolsfull & abbreviated

DateFormatSymbolsfull/abbreviated/narrowformatting/standalone

Relative n/a n/a n/a DateFormat (RelativeDateFormat)

Interval n/a n/a n/a DateIntervalFormat

Duration n/a n/a n/a TimeUnitFormat

Calendar system Gregorian and its variants

15 calendar types Gregorian, Thai Buddhist and Japanese

11 calendar types

Page 16: Knotty problems in date/time parsing and formatting and time zones

16 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Design/Implementation Tips

Keep internal date/time representation locale-independent– Localized format may vary depending on implementation

– Use standard format such as ISO8601 for data exchange Do not hardcode format patterns in your source code Do not put format patterns in resource bundles with other localizable

messages!– Locale support is more than UI translation

– Translation vendors are usually not able to handle regional variants

– You should be able to find solutions in CLDR/ICU – if no available, file bugs to request new features

Avoid date/time data entry by text– Formatting date/time is complicated, so is parsing

– Use UI widget to eliminate ambiguous data entry Understand regional conventions of calendar system

– Rules for calculating some calendar fields may vary Be prepared to support non-Gregorian calendar systems

– For example,

– Buddhist calendar is the most preferred calendar system in Thai– Japanese calendar support may be required depending on target sectors

Page 17: Knotty problems in date/time parsing and formatting and time zones

17 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Understanding Time Zone Formatting and Parsing

CLDR’s approach for supporting time zone formatting Choosing a right time zone format type for your needs Tips for processing date/time with time zone

http://www.time.gov/images/worldzones.gif

Page 18: Knotty problems in date/time parsing and formatting and time zones

18 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Time Zone Implementations

The tz database (a.k.a Olson database)– 568 zones (436 unique zones / 132 aliases) (2008d)

– Support historic time transitions since late 19th century

– At least 1 zone per country/region

– Time zone abbreviations for display (3 or 4 letter ASCII alphabet), such as “EST”, “JST”…

– Used by *nix systems (Solaris, Linux, AIX, Mac OS X…) and Java MS Windows time zone

– 84 zones (Windows Vista), some are obsolete

– Support historic rules (2005 and beyond) in Vista/2008 Server (Dynamic DST)

– A zone is shared by multiple cities/countries

– Time zone display names including the standard offset and common name or exemplar cities, such as “(GMT-05:00) Eastern Time (US & Canada)”, “(GMT+09:00) Osaka, Sapporo, Tokyo”…

Page 19: Knotty problems in date/time parsing and formatting and time zones

19 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Time Zone Format Types in CLDR (1)

Generic location format– Designed for populating choice lists for time zones

– Uniquely mapped to “canonical” zone IDs

– Examples

– Europe/Rome ⇔ Italy Time [en]– America/New_York ⇔ United States (New York) Time [en]– America/New_York ⇔ Hora de Estados Unidos (New York) [es]

Generic non-location format– Designed for recurring events, meetings, or anywhere people do not want to be

overly specific

– Two widths – long/short

– Examples

– America/New_York ET ⇒ [en/short]– America/New_York Eastern Time ⇒ [en/long]– America/Montreal Eastern Time ⇒ [en/long]

Page 20: Knotty problems in date/time parsing and formatting and time zones

20 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Time Zone Format Types in CLDR (2)

Generic partial location format– A variant of generic non-location format – used as a fallback name when the generic non-location

format is not specific enough

– Two widths – long/short

– Examples

– America/Mexico_City Hora central (Ciudad de México) ⇒ [es_US/short/Mar 9 – April 6, 2008]– America/Chicago Hora central (Chicago) ⇒ [es_MX/short/Mar 9 – April 6, 2008]

Specific (non-location) format– Designed to distinguish between standard time and daylight time

– Two widths – long/short

– Examples

– America/New_York EST ⇒ [en/short/standard time]– America/New_York EDT ⇒ [en/short/daylight time]– America/New_York Eastern Standard Time ⇒ [en/long/standard time]– America/Montreal Eastern Standard Time ⇒ [en/long/standard time]

Page 21: Knotty problems in date/time parsing and formatting and time zones

21 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Time Zone Format Types in CLDR (3)

Localized GMT format– Designed for representing the offset from GMT

– Local decimal digits are used

– Examples

– America/New_York GMT-05:00 ⇒ [en/standard time]– America/New_York GMT-04:00 ⇒ [en/daylight time]– America/New_York Гриинуич-0500 ⇒ [bg/standard time]

RFC 822 format– Locale in-sensitive “fixed” format representing the offset from GMT defined by RFC

822

– ASCII decimal digits are always used

– Examples

– America/New_York -0500 ⇒ [standard time]– America/New_York -0400 ⇒ [daylight time]

Page 22: Knotty problems in date/time parsing and formatting and time zones

22 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

CLDR Metazone

A metazone is an grouping of one or more internal zones that share common non-location display names

– Following zones are currently associated with a metazone “America_Eastern” (CLDR 1.6.1)America/Nassau, America/Resolute, America/Coral_Harbour, America/Thunder_Bay, America/Nipigon, America/Toronto, America/Montreal, America/Iqaluit, America/Pangnirtung, America/Port-au-Prince, America/Jamaica, America/Cayman, America/Panama, America/Grand_Turk, America/Indiana/Vincennes, America/Indiana/Petersburg, America/Indiana/Marengo, America/Indiana/Winamac, America/Indianapolis, America/Louisville, America/Indiana/Vevay, America/Kentucky/Monticello, America/Detroit, America/New_York

Each metazone has a set of localizable names– Following names are used for metazone “America_Eastern” (CLDR 1.6.1)

Locale long short

generic standard daylight generic standard daylight

en Eastern Time Eastern Standard Time Eastern Daylight Time ET EST EDT

fr Heure de l’Est Heure normale de l’Est Heure avancée de l’Est HE HNE HAE

zh_Hans 美国东部时间 东部标准时间 东部夏令时间 ET EST EDT

Page 23: Knotty problems in date/time parsing and formatting and time zones

23 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Time Zone Short Abbreviation Problem

2 to 4 letter ASCII alphabets abbreviations are used for short names, such as ET, EST, PDT…

The extent to which time zone abbreviations are understood varies heavily by region

– For example, how many people recognize EAT (East Africa Time) in US?

CLDR’s solution - a boolean value associated with a zone/metazone “commonlyUsed” to enable/disable short abbreviations

– Metazone “Africa_Eastern” has a short standard name “EAT” for English locales

– For metazone “Africa_Eastern”

– commonlyUsed = true in en_ZA [English (South Africa)]

– commonlyUsed = false in en_US [English (United States)]

Page 24: Knotty problems in date/time parsing and formatting and time zones

24 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Ambiguous Time with Generic format

Daylight Standard transition⇒– Sunday, November 2, 2008 01:30:00 Pacific Time?

– Valid, happens twice

– Generic format cannot distinguish between 1:30 PST and 1:30 PDT

Standard Daylight transition⇒– Sunday, March 9, 2008 02:30:00 Pacific Time?

– Invalid!

– 30 minutes 1 second after 01:59:59? or 30 minutes before 03:00:00?

Page 25: Knotty problems in date/time parsing and formatting and time zones

25 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

CLDR Time Zone Formatting Patterns (1)

Letter Width Format Description Example Roundtrip time

Roundtrip canonical zone

z 1…3 Specific non-location short format(commonlyUsed = true) ⇒ Localized GMT format

PSTPDTGMT-08:00

yes no

4 Specific non-location long format ⇒ Localized GMT format

Pacific Standard TimePacific Daylight TimeGMT-08:00

yes no

Z 1…3 RFC 822 format -0800 yes no

4 Localized GMT format GMT-08:00Гриинуич-0800

yes no

Page 26: Knotty problems in date/time parsing and formatting and time zones

26 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

CLDR Time Zone Formatting Patterns (2)

Letter Width Format Description Example Roundtrip time Roundtrip canonical zone

v 1 Generic non-location short format (commonlyUsed = true)

⇒ Generic partial location short format & (commonlyUsed = true)

⇒ Localized GMT format

PTPT (Canada)PT (Yellowknife)GMT-08:00

no(at transition)

no

4 Generic non-location long format

⇒ Generic partial location long format

⇒ Localized GMT format

Pacific TimePacific Time (Canada)Pacific Time (Yellowknife)GMT-08:00

no(at transition)

no

V 1 Specific non-location short format

⇒ Localized GMT format

PSTPDTGMT-08:00

yes no

4 Generic location format ⇒ Localized GMT format (only

for GMT style time zones such as Etc/GMT+8)

Italy TimeUnited States (Los Angeles) TimeGMT-08:00

no(at transition)

yes

Page 27: Knotty problems in date/time parsing and formatting and time zones

27 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Tips for Processing Date/Time with Time Zone

For serializing future date/time data in text format, use RFC 822 format with zone ID

– Time zone rules could be changed

– GMT offset information along with zone ID is sufficient to fix up data

The result of java.util.Date#toString() might be ambiguous– “CST” is used for both “America/Chicago” and “Asia/Shanghai” in Java

– CLDR does not use a same name for multiple time/meta zone

Many zones in tz database use LMT (Local Mean Time) as initial offset– LMT is calculated from the longitude and the GMT offset has a fraction of minute

– ISO8601 / RFC822 / Java GMT format does not have second field, so it may not roundtrip

Minimize the dependencies on Windows time zone in multi-platform applications

– Some windows time zones are not well maintained

– No historic time zone rule support before Vista/2008 server

– Mapping between Windows time zones and the tz database is 1-to-n

Page 28: Knotty problems in date/time parsing and formatting and time zones

28 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation

Links

Unicode CLDR project - http://www.unicode.org/cldr/ UTS#35 UNICODE LOCALE DATA MARKUP LANGUAGE (LDML) -

http://www.unicode.org/reports/tr35/ ICU Project - http://icu-project.org/ tz database - http://www.twinsun.com/tz/tz-link.htm