ibm i globalization v3.11

71
CEC2011 – IBM i Globalization [email protected]

Transcript of ibm i globalization v3.11

Page 1: ibm i globalization v3.11

CEC2011 – IBM i [email protected]

Page 2: ibm i globalization v3.11

CEC2011

My profile

Page 3: ibm i globalization v3.11

CEC2011

keywords

• I(nternazionalizatio)n i18n– Process of producing a product(design and code)

indipendent of a language, script, culture or character setNeutral

+• L(ocalizatio)n l10n

– Process of adapting an internazionalized product to specific languages, scripts, cultures and character sets Customize, extend

=

Page 4: ibm i globalization v3.11

CEC2011

keywords

• G(lobalization)n g11n– Proper design and execution so one instance of

software, executing on a single machine, can process multilingual data ad present it culturally correct in a multicultural environment; G11N = I18N + L10N + Multilingual Support

Page 5: ibm i globalization v3.11

CEC2011

Character representation

• Some characters from Italy, Germany, France, China, Greece, Sweden, Japan…

Page 6: ibm i globalization v3.11

CEC2011

Character representation

• CS – Character set

– A collection of elements used to represent textual information (e.g. 0-9, a-z, A-Z, .,;:!? … )

– A Character Set generally supports more than one language

Page 7: ibm i globalization v3.11

CEC2011

Character SET a subset of chars

• CS 695 – Euro Country Extended Code Page

Page 8: ibm i globalization v3.11

CEC2011

Character SET a subset of chars

• CS 925 – Greece

Page 9: ibm i globalization v3.11

CEC2011

Character SET a subset of chars

• CS 1172 – Japanese alpha and Katakana

Page 10: ibm i globalization v3.11

CEC2011

Character SET a subset of chars

• CS 1150 – Cyrillic Russian

Page 11: ibm i globalization v3.11

CEC2011

Character SET a subset of chars

• CS 1174 – People’s Republic of China

Page 12: ibm i globalization v3.11

CEC2011

Character SET a subset of chars

Page 13: ibm i globalization v3.11

CEC2011

Code Page

• Code Page (CP)

– Defines a subset of characters from a Character Set

– Each character in a character set is assigned a numerical representation (Hex Code)

Page 14: ibm i globalization v3.11

CEC2011

CCSID

• A unique number (0-65535) used by IBM to uniquely identify a Character Set and a Code Page

• Defines an ENCODING Scheme

Page 15: ibm i globalization v3.11

CEC2011

Encoding Scheme

ES Encoding Scheme1100 EBCDIC, single-byte, No code extension is allowed1301 EBCDIC, mixed single-byte and double-byte, using shift-in (SI) and shift-out (SO) code extension method4100 ISO 8, single-byte, No code extension is allowed7200 UCS-2, No code extension is allowed7808 UTF-8, No code extension is allowed

Encoding Scheme

• EBCDIC – SBCS (1Byte/Char)• EBCDIC – DBCS (2Byte/Char)• ASCII (1Byte/Char)

• UNICODE (………)

Page 16: ibm i globalization v3.11

CEC2011

CCSID - Attributes

CCSID Character Set Code Page Encoding Scheme Description37 697 37 1100 USA

273 697 273 1100 Germany280 697 280 1100 Italy

1025 1150 1025 1100 Cyrillic Russian1388 1174 836 1301 Simplified Chinese

Character Set697 Latin 1

1150 Cyrillic Multilingual1174 Simplified Chinese Ext (EBCDIC/PC Common)

Encoding Scheme1100 EBCDIC, single-byte, No code extension is allowed. Number of States = 1.1301 EBCDIC, mixed single-byte and double-byte, using shift-in (SI) and shift-out (SO) code extension method

Code Page836 Simplified Chinese Extended

37 USA/Canada - CECP273 Germany F.R./Austria - CECP280 Italy - CECP

1025 Cyriliic multilingual

Page 17: ibm i globalization v3.11

CEC2011

CCSID

• Same CS (697 Latin-1) Different CP Different CCSID Different Character position

1140: USA 1144: ITA

Page 18: ibm i globalization v3.11

CEC2011

Fixed/Variant Code Points

VARIANT Code PointsCharacters that do change hex values (position):§, £, #, $, @, !

FIXED Code PointsCharacter that do NOT chages hex valuesA-Z, a-z, 0-9, ()/+-_*%.;:,

Hint: Avoid using characters that are not in the invariant character set for names and literals in programs.

Page 19: ibm i globalization v3.11

CEC2011

SBCS-DBCS

• SBCS– EBCDIC – Each CCSID can store x’FF’ = 256 Chars

• DBCS– EBCDIC– Each CCSID can store x’FFFF’ = 65536

Chars– APAC Only:

Chinese (Simplifies and Traditional)JapaneseKorean

Page 20: ibm i globalization v3.11

CEC2011

Data Integrity

• If characters are in both CCSID – Ok match!

• Else– Roundtrip

ITA è USA } ITA è– Substitution char

Some cases (e.g.FTP) Substitution char x’3F’

Page 21: ibm i globalization v3.11

CEC2011

!

• Never use CCSID 65535 in a multilingual Environment

• 65535 means NO TRANSLATE– turns off automatic conversion – maintains the same codepoint across

different Codepages

• 65535 ok in a single language env

Page 22: ibm i globalization v3.11

CEC2011

Numeric columns NO CCSID

CCSID

PF-SRC

PF-DTA

Page 23: ibm i globalization v3.11

CEC2011

CCSID

Page 24: ibm i globalization v3.11

CEC2011

• Job CCSID if set is used. • If the Job CCSID is set to *USRPRF then the

user profile is checked.• If the user profile CCSID is set then it is

used.• If the user profile value is set to *SYSVAL

then the system value is checked.• If the system value is set to 65535 then the

Language id is checked.• If the language id value is set then the

QTQ_DEFAULT_CCSID is used, else the language id is converted to a CCSID.

CCSID - escalation

Page 25: ibm i globalization v3.11

CEC2011

iSeries Access for windows

• Not UNICODE Compliant• Needs NL Installation• Depends on Client (Win) codepage

Language CCSIDClient CodePage

German 273 850Italian 280 850Russian 1025 866Simpl.Chinese 1388 936

Page 26: ibm i globalization v3.11

CEC2011

iSeries Access for windows

Language CCSIDClient CodePage

German 273 850Italian 280 850Russian 1025 866Simpl.Chinese 1388 936

Page 27: ibm i globalization v3.11

CEC2011

iSeries Access fow windows

Page 28: ibm i globalization v3.11

CEC2011

iSeries Access fow windows

Limits: 1 CCSID/Job

Page 29: ibm i globalization v3.11

CEC2011

National Language

• Primary and secondary Language

Page 30: ibm i globalization v3.11

CEC2011

National Language

• Primary and secondary Language

Page 31: ibm i globalization v3.11

CEC2011

National Language

• Primary and secondary Language

Page 32: ibm i globalization v3.11

CEC2011

About CP, CS, CCSID

http://www-01.ibm.com/software/globalization/g11n-res.html

Page 33: ibm i globalization v3.11

CEC2011

• SBCS/DBCS

• Limits :one CCSID(language)/Work Session

• Limits :one CCSID(language)/DB.Column• Limits :more code (SBCS/DBCS)

Limits

Page 34: ibm i globalization v3.11

CEC2011

Unicode

• Single Character Set– Contains all current and paste languages– A unique number for every character– Different way to store data (not only

16bit)– Has mapping to all CharSets

Page 35: ibm i globalization v3.11

CEC2011

Unicode

• Now– Hundreds of CCSID: one for each

language (SBCS/DBCS)

• Unicode– One encoding system includes all

language characters

Page 36: ibm i globalization v3.11

CEC2011

Unicode

There is a code page for every language, each character being represented by a number

Page 37: ibm i globalization v3.11

CEC2011

Unicode - Endian

Little Endian(intel)

Big Endiani5

NO Endian

UTF16 BE

UTF16 LE

Page 38: ibm i globalization v3.11

CEC2011

Unicode - Encodings

First version of unicode 2 byte/Char 65535 Characters

Version 2 multibyte > 1 million characters

Unicode supports three UTF formatsthere are three widely accepted schemes, or Unicode transformation formats ( UTF's )

– UTF-8– UTF-16 (default) – UTF-32

Page 39: ibm i globalization v3.11

CEC2011

• Unicode (UCS-2) support 3 UTF formats– UTF8

No EndianWEBMultibyte

– UTF16Little-Big Endian (Little: Intel)Host Languages on i5 (RPG/CBL)

– UTF32No support on i5

Unicode - Encodings

Page 40: ibm i globalization v3.11

CEC2011

Unicode - Encodings

UTF88 bit Blocks

ABC x’414243’

UTF1616 bit Blocks

ABC x’004100420043’

UTF3232 bit Blocks

ABC x’000000041000000042000000043’

Page 41: ibm i globalization v3.11

CEC2011

Unicode - Multibyte

UTF8 (example)depending on the first bits…

Page 42: ibm i globalization v3.11

CEC2011

Unicode – Multibyte - example

UTF8: 11100100-10001000-10101101

UTF16 BE

UTF16 LE

Page 43: ibm i globalization v3.11

CEC2011

Unicode - CCSID

Encoding CCSID Note Char UnitUTF-8 1208 from 5.3 8 BitUTF-16 1200 from 5.3 16 BitUTF-32 NA 32 Bit

UCS-2 13488 superseded --> UTF-16 16 Bit

UCS-4 NA 32 Bit

UTF-8 (Unicode Transformation Format) is mapping algorithm : 1 char 1-n Octets Memory usage depend on different languages e.g.English 1 Byte/CharGreek/Russian/Arabian/Hebrew 1,7 Byte/CharOther European languages 1,1 Byte/CharChinese/Japanese/Hindi/Korean 3 Byte/Char

UTF161 Char 1-n 16BitGroupsUTF-16 is the standard for Unicode.

UCS-2 (Universal Multiple-Octet Coded Character Set) Superseded by UTF16

UTF8CCSID: 1208Data TYPE : CHARUTF16CCSID: 1200 (or 13488)

Data TYPE: Graphic

Page 44: ibm i globalization v3.11

CEC2011

Unicode

Page 45: ibm i globalization v3.11

CEC2011

Unicode

Remember…5250 Screen 1 CS – NO UNICODE Allowed

But…

Page 46: ibm i globalization v3.11

CEC2011

Unicode – i access for WEB

Russian

English

Chinese

Page 47: ibm i globalization v3.11

CEC2011

iSeries Navigator and Unicode

Page 48: ibm i globalization v3.11

CEC2011

• Unicode - enabled softwareWebsphereLotus DominoDB2 UDBIFSWeb browsersXMLJava

• I5/OS components not Unicode enabled QSYS library systemOS/400 message filesPersonalCOMmunication

Unicode - enabled software

Page 49: ibm i globalization v3.11

CEC2011

USER Interface

• DDS-5250

• JDBC-ODBC-WEB– Rewrite apps

Page 50: ibm i globalization v3.11

CEC2011

RPG and Unicode

Default: CCSID 13488

If you need CCSID 1200

Unicode

Page 51: ibm i globalization v3.11

CEC2011

RPG and Unicode

Very Easy!Remember:Char and Unicode :Different weight

Page 52: ibm i globalization v3.11

CEC2011

CCSID to CCSID

• LF support

• iconv()

Page 53: ibm i globalization v3.11

CEC2011

Something about IFS

• Table fields have a CCSID Tag

• Stream File in IFS has CCSID Tag

• Stream File in other system doesn’t

Page 54: ibm i globalization v3.11

CEC2011

Something about IFS

How to translate correctly?

UTF16 BE

UTF16 LE

Page 55: ibm i globalization v3.11

CEC2011

Something about IFS

BOM – Byte order markfirst bytes of stream file

UTF16 BE

UTF16 LE

Page 56: ibm i globalization v3.11

CEC2011

Something about IFS

Page 57: ibm i globalization v3.11

CEC2011

Something about IFS

Iconv()

Page 58: ibm i globalization v3.11

CEC2011

Something about IFS

• Table fields have a CCSID Tag

• Stream Files in IFS have CCSID Tag

• Stream Files in other system don’t

• Stream files have BOM

• Table columns don’t

Page 59: ibm i globalization v3.11

CEC2011

php

Means:php does not FULL support UTF-

16

Page 60: ibm i globalization v3.11

CEC2011

php – setup UTF8

Page 61: ibm i globalization v3.11

CEC2011

php – setup UTF8

Column DESCR CCSID 1208/13400/1200

Read correctly from 1208, 1200, 13488Write correctly from phpvars to 1208

Page 62: ibm i globalization v3.11

CEC2011

php

Page 63: ibm i globalization v3.11

CEC2011

Globalization guidelines

• User interface• messages, dialog boxes, online manuals, audio

output, animations, windows, help text, tutorials, diagnostics, clip art, icons, and any presentation control that is necessary to convey information to users

• Culture and conventions• Date and time, Address, Numeric shapes, Numeric

Values

• Product structure

Page 64: ibm i globalization v3.11

CEC2011

User Interface

Variable Order

IconsAvoid text in icons.Avoid internationally recognized symbols in icons. (e.s. star6, cross/plus sign)Avoid the use of national flags in icons.

Line break rulesYou cannot use Latin script-based text formatting algorithms for Chinese/Japanese

Page 65: ibm i globalization v3.11

CEC2011

Culture and conventions

CalendarAllow the user to select the calendar and calendar format.Be prepared to adapt to other calendar requirements.

Page 66: ibm i globalization v3.11

CEC2011

Culture and conventions

Date and Time

Country FormatRussia 08 sen. 1994 g.The Netherlands 08 september 1994Bulgaria 1994-IX-08Arabic countries 08/09/94Germany 8.9.1994Iran 1373/6/17Islamic lunar 1415/4/2Israel 3 Trishrey 5755

Country FormatCanada 2.00 p.Canada (Québec) 14 hItaly 14.00Sweden kl 14.00USA 2.00 p.

Page 67: ibm i globalization v3.11

CEC2011

Culture and conventions

Timezones

Time zones and daylight savings time (DST) affect time stamps.

There are some 3part products (e.g. TZN/400)

I5 system values doesn’t support different TZ

LPAR can be a solution

You can write our routine: offset can depend from the user, the InfoSystem… (Before trigger)

Page 68: ibm i globalization v3.11

CEC2011

Culture and conventions

Paper SizesLetter, A4…

Cardinal number shape

Numeric ValuesNegative numbers format Decimal and thousands separators

Monetary AmountCountry FormatUS $12,345.67US USD 12,345.67Denmark kr 12.345,67France 12 345,67 €Portugal 12.345$67 €

Page 69: ibm i globalization v3.11

CEC2011

Culture and conventions

Measurement systemMiles, inches, km, °C, °F….

First day of week

AddressFields, Labels, presentation order

Telephone formats+ - . numbers

Page 70: ibm i globalization v3.11

CEC2011

Product structure

Isolating culture and language sensitive parts• easy to change

Write one set of application source code that will work correctly, without modification, in each of the required countries or regions.

Page 71: ibm i globalization v3.11

CEC2011

TNX