Epub Mini Requr

22
Copyright@W3C India Page 1 Minimum Requirements on E-PUB for Indian languages text layout This Version: 1.0 Latest Version: http://w3cindia.in/word_pdf/epub_mini_requr.pdf Working Draft: First working draft Introduction: This document describes minimal requirements specifications for Indian languages text layout required for content format in E-publishing. This documents covers major issues of E-content in Indian languages in order to create standardize format of text layout like storage, rendering problems, vertical writing, margins areas, page numbers, repeated head, line breaking etc and CSS requirements for Indian languages. The main purpose of this document is to gather the information from E-publishers about the page text layout they are using for E-publishing. Storage Requirements: 1. Support of Unicode 6.2 and IVS For the global language support EPUB should support Unicode and also should support SVG Fonts and IVS (Ideographic Variation Sequence). UNICODE is the Universal character encoding standard, used for representing text for information processing. Unicode encodes all of the individual characters used for all the written languages of the world. The standards provide information about the character and their use. Unicode uses a 16 bit encoding that provides code point for more than 65000 characters (65536). It assigns each character a unique hexadecimal numeric value and name. Reference URL : http://www.unicode.org/versions/Unicode6.2.0/

description

wpub

Transcript of Epub Mini Requr

  • Copyright@W3C India Page 1

    Minimum Requirements on E-PUB for Indian languages text layout

    This Version: 1.0

    Latest Version: http://w3cindia.in/word_pdf/epub_mini_requr.pdf

    Working Draft: First working draft

    Introduction:

    This document describes minimal requirements specifications for Indian

    languages text layout required for content format in E-publishing.

    This documents covers major issues of E-content in Indian languages in order to

    create standardize format of text layout like storage, rendering problems, vertical

    writing, margins areas, page numbers, repeated head, line breaking etc and CSS

    requirements for Indian languages.

    The main purpose of this document is to gather the information from E-publishers

    about the page text layout they are using for E-publishing.

    Storage Requirements:

    1. Support of Unicode 6.2 and IVS

    For the global language support EPUB should support Unicode and also

    should support SVG Fonts and IVS (Ideographic Variation Sequence).

    UNICODE is the Universal character encoding standard, used for

    representing text for information processing. Unicode encodes all of the

    individual characters used for all the written languages of the world. The

    standards provide information about the character and their use. Unicode

    uses a 16 bit encoding that provides code point for more than 65000

    characters (65536). It assigns each character a unique hexadecimal numeric

    value and name.

    Reference URL : http://www.unicode.org/versions/Unicode6.2.0/

  • Copyright@W3C India Page 2

    Common Locale Data Repository is the largest standard repository of

    locale data in the world. It is a part of the W3C and Unicode Standard. It

    provides locale data in an XML format for use in computer applications. It

    facilitates locale-related information sharing among applications regardless

    of their domains. Its goal is to provide basic linguistic information for

    diverse locales in an open, interoperable form.

    This data is usable for localizing applications.

    Some examples of the information that CLDR gathers for languages and

    territories are:

    Date formats

    Time Zones

    Number formats

    Currency and its formats

    Measurement Systems

    Collation (Sort order) Specification: Sorting, Searching and

    Matching

    Translations of names for language, territory, script, time zones,

    currencies

    Script and exemplar characters used by a language.

    Calendaring rules, Formats and important dates.

    Specification of selected but universal cultural terminologies.

    Reference URL: http://cldr.unicode.org/

    IVS (Ideographic Variation Sequence): Characters in the Unicode Standard

    can be represented by a wide variety of glyphs. Occasionally the need arises

    in text processing to restrict or change the set of glyphs that are to be used

    to display a character. In special circumstances, this restriction needs to be

    expressed in plain text rather than by font selection or some other rich text

    mechanism. The Unicode Standard accommodates those circumstances

    with variation selectors: the code point of a graphic character can be

    followed by the code point of a variation selector to identify a restriction on

    the graphic character. The combination of a graphic character and a

  • Copyright@W3C India Page 3

    variation selector is known as a variation sequence. An Ideographic

    Variation Sequence (IVS) is a sequence of two coded characters, the first

    being a character with the Unified Ideograph property, the second being a

    variation selector character in the range U+E0100 to U+E01EF.

    A glyphic subset for a given character is a subset of the glyphs that are

    appropriate for displaying that character.

    Reference URL : http://www.unicode.org/reports/tr37/

    2. Fonts

    Open Type fonts convert the Unicode code numbers to their glyphs on the

    display interface. They are directly based on Unicode. Open Type provides a

    series of enhancements to the TrueType format, the most significant of

    which allows PostScript font data to nest inside a TrueType software

    wrapper.

    Open Type allows type designers and font foundries to create larger

    character sets within fonts. Within the parameters of the TrueType and

    Type 1 formats, fonts are limited to 256 characters. If a typeface designer

    wanted to create an extended ligature set, small caps, swash and alternate

    characters, or characters to support multiple languages, these had to be

    put into another font. The large character set capabilities of Open Type

    allows type designers much more latitude in typeface design, resulting in

    better graphic communication.

    SVG Fonts: The purpose of SVG fonts is to allow for delivery of glyph

    outlines in display-only environments. SVG fonts that accompany Web

    pages must be supported only in browsing and viewing situations. Graphics

    editing applications or file translation tools must not attempt to convert

    SVG fonts into system fonts.

    Reference URL: http://www.w3.org/TR/SVG/fonts.html

    WOFF (Web Open font format):

    This format was designed to provide lightweight, easy-to-implement

    compression of the font data, suitable for use in conjunction with the

  • Copyright@W3C India Page 4

    @font-face CSS declaration. Any TrueType/Open Type/Open Font Format

    file can be loss-lessly converted to WOFF for Web use (subject to licensing

    of the font data). Once decoded by a user agent, the WOFF font will display

    identically to the original desktop font from which it was created.

    The WOFF format also allows additional metadata to be attached to the

    file; this can be used by font designers to include licensing or other

    information, beyond that present in the original font. Such metadata does

    not affect the rendering of the font in any way, but may be displayed to the

    user on request.

    Reference URL: http://www.w3.org/TR/WOFF/

    Page text layout Requirements:

    The following issues should help in the implementation of text layout for

    Indian languages:

    Arrangement of Running Heads and Page Numbers Positioning of all running heads and page numbers in the same book should

    be consistent. The following ways might be used for positioning running

    heads and page numbers in horizontal writing system:

  • Copyright@W3C India Page 5

    Positioning of Consecutive Opening Brackets, Closing Brackets, Commas, Purna virama etc In cases where multiple punctuation marks, such as opening brackets,

    closing brackets, commas, Purna Viram, come one after the other, the

    space adjustments are made.

    Vertical writing and horizontal writing

    When the principal text direction is horizontal, every text including page

    headers/footers, page numbers, figure captions, table captions, and table

    entries is in horizontal writing mode.

    When the principal text direction is Vertical, every text including table

    entries is in Vertical writing mode.

  • Copyright@W3C India Page 6

    Paragraph Adjustment Rules

    Line Head Indent at the Beginning of Paragraphs:

    A paragraph, a section of a document which consists of one or more sentences to indicate a distinct idea, usually begins on a new line.

    Widow Adjustment of Paragraphs:

    The intent of widow adjustment of paragraphs is to avoid that the last line of a paragraph contains less than a given number of characters. This is also called "widow" processing.

    Mixed Text Composition in Horizontal Writing Mode.

    In horizontal writing mode the basic approach is to use proportional Western fonts. Example of proportional Western fonts used in Indian languages in horizontal writing mode.

    India Mixed Text Composition in Vertical Writing Mode.

    In Vertical writing mode the basic approach is to use proportional Western fonts. Example of proportional Western fonts used in Indian languages in Vertical writing mode.

    I

    n

    d

    i

    a

  • Copyright@W3C India Page 7

    Styling Requirements:

    The following CSS issues should help in the implementation of text layout

    for Indian languages:

    Drop First letter

    The first-letter pseudo-element represents the first letter of the first line of a block, if it is not preceded by any other content (such as images or inline tables) on its line. It allows that first letter to be styled individually, without markup. It may be used for "initial caps" and "drop caps", which are common typographical effects in text in Latin script.

    Vertical & horizontal writing

    Vertical arrangement of characters If some string is written in vertical

    mode, then writing each character on a new line may not be suitable,

  • Copyright@W3C India Page 8

    Styling like vertical arrangement of the character in Hindi

    Line breaking

    Unicode Line Breaking Algorithm UAX #14-(Word wrapping)

    Characters not starting a line): A line should not begin with the characters shown below: closing brackets (cl-02), hyphens (cl-03), dividing punctuation marks (cl-04), middle dots (cl-05), full stops (cl-06), commas (cl-07), iteration marks (cl-09),

    Reference URL: http://unicode.org/reports/tr14/ Reference URL: http://www.w3.org/TR/2007/WD-css3-text-20070306/#line-breaking

    Indentation

    Sometimes some of the character of a word is indented as in figure-3

    the is indented

    Example in Bangla:

  • Copyright@W3C India Page 9

    What should be the solution or rule for such type of styling issue in case

    of Indian language Some time people said that styling is done on the

    basis of the syllable, but what is the definition of syllable. The definition

    of syllable depends on the pronunciation of the word. In the example

    the syllables are , , , , but styling is done as

    which is not as per the syllable. So we should define the rule instead

    of defining it by syllable basis.

    Letter spacing

    Same thing applies to horizontal spacing as well for Indic languages styling issues like the Horizontal spacing between characters like C E R T I F I C A T E the space is given between the every character in case of English. But in case of Indian language like Bangla, Assamese etc the space may give not in every character but after some portion of the character sequence as in figure below:

    Reference URL : http://www.w3.org/TR/2007/WD-css3-text-20070306/#letter-spacing

  • Copyright@W3C India Page 10

    Underlining

    There is some examples of Indian languages in which Matras are not

    readable due to underlining of characters

    Hindi -

    Punjabi Matras are not readable

    Bengali:

    Guajarati -

    Marathi-

    Tamil-

    -

    Telugu - TV9 " " - 2

    When we see these pages on internet, the information is not clearly

    readable because if we hyperlink the text in Indian languages some

    modifiers (matras) are cut and in Punjabi the underline matches few

    matras (Small u). It can create problem in reading the information

    correctly. Therefore some changes may be required to be implemented

    in CSS standards developed by W3C with respect to Indian languages.

    Reference URL: http://www.w3.org/TR/CSS2/text.html#decoration

  • Copyright@W3C India Page 11

    CSS Embedded fonts

    First, add the font to your book files in the normal way, by adding an

    @font-face statement at the beginning of your CSS, something like this:

    @font-face {

    font-family: Prophecy Script;

    font-style: normal;

    font-weight: normal;

    src:url("Fonts/Prophecy_Script.ttf");

    }

    That makes the font available. To apply it to your text, you have to add it

    to one of your styles, also in the CSS:

    p.letter {

    font-family: "Prophecy Script";

    font-weight: normal;

    font-style: normal;

    font-size: 1em;

    margin: 1em 0 0 0;

    -webkit-hyphens:none;

    }

  • Copyright@W3C India Page 12

    Reference URL: http://w3cindia.in/cssdocument.html

  • Copyright@W3C India Page 13

    Styling issues for Urdu:

    Horizontal writing for Urdu

    Direction of writing: words are written in horizontal lines from right to left, numerals are written from left to right Number of letters: 28 (in Arabic) - some additional letters are used in

    Arabic when writing place names or foreign words containing sounds

    which do not occur in Standard Arabic, such as /p/ or /g/. Additional

    letters are used when writing other languages.

  • Copyright@W3C India Page 14

    First Letter

    In Cursive Text like Arabic and Urdu the styling is applied to whole word

  • Copyright@W3C India Page 15

    Styling Requirements for Mobile:

    The following CSS mobile properties must be found for Indian languages

    in order to get the proper E-content on mobile

    Vertical-align

    This property affects the vertical positioning inside a line box of the

    boxes generated by an inline-level element.

    Text-decoration

    This property describes decorations that are added to the text of an

    element using the element's color. When specified on or propagated to

    an inline element, it affects all the boxes generated by that element, and

    is further propagated to any in-flow block-level boxes that split the

    inline.

    Value: none | [underline || overline || line-through || blink] |

    Letter-spacing

    This property specifies spacing behavior between text characters.

    Text-indent

    This property specifies the indentation of the first line of text in a block

    container. More precisely, it specifies the indentation of the first box

    that flows into the block's first line box. The box is indented with respect

    to the left (or right, for right-to-left layout) edge of the line box. User

    agents must render this indentation as blank space.

    Reference URL: http://www.w3.org/TR/css-mobile/

  • Copyright@W3C India Page 16

    CSS Speech Module Requirements:

    The CSS Speech module provides properties that enable authors to declaratively

    control presentational aspects of the aural dimension (e.g. TTS voice, pitch, rate,

    and volume levels). These style sheet properties can be used together with visual

    properties (mixed media), or as a complete aural alternative to a visual

    presentation.

    Typical examples include in-car use of an e-book reader, industrial and medical

    documentation systems, home entertainment, helping users to learn reading, or

    supporting users who have reading difficulties (print disabilities).

    Properties

    voice-volume

    The voice-volume property allows authors to control the amplitude of

    the audio waveform generated by the speech synthesizer, and is also

    used to adjust the relative volume level of audio cues within the audio

    "box" model.

    voice-balance

    The voice-balance property controls the spatial distribution of audio

    output across a lateral sound stage: one extremity is on the left, the

    other extremity is on the right hand side, relative to the listener's

    position.

    speak

    The speak property determines whether or not to render text aurally.

    speak-as

    The speak-as property determines in what manner text gets rendered

    aurally, based upon a basic predefined list of possible values.

  • Copyright@W3C India Page 17

    Pause properties

    The pause-before and pause-after properties specify a prosodic

    boundary (silence with a specific duration) that occurs before (or after)

    the speech synthesis rendition of the selected element, or if any cue-

    before (or cue-after) is specified, before (or after) the cue within the

    audio "box" model.

    Rest properties

    The rest-before and rest-after properties specify a prosodic boundary

    (silence with a specific duration) that occurs before (or after) the speech

    synthesis rendition of an element within the audio "box" model.

    Cue properties

    The cue-before and cue-after properties specify auditory icons (i.e.

    pre-recorded / pre-generated sound clips) to be played before (or after)

    the selected element within the audio "box" model.

    Voice characteristic properties

    a. voice-family

    b. voice-rate

    c. voice-pitch

    d. voice-range

    e. voice-stress

    f. voice-duration

    Reference URL: http://www.w3.org/TR/css3-speech/

  • Copyright@W3C India Page 18

    E-publishing survey

    Types of survey:

    1. Online survey

    2. Offline survey

    1 Online survey: Survey by online form submission.

    a) Online analysis

    1. Search through websites:

    1.1 The categories of websites

    a. Online newspapers publishers

    b. Portales like rediffmail, indiatimes, yahoo, etc

    c. E-publishers (e-book, magazines, entertainment)

    d. Mobile VAS content(As per list provided by IAMAI)

    1.2 Things needs to be search through websites

    a. Encoding used

    b. File format

    c. Image format

    d. Number of languages used

    e. Frequency and circulation of publishing (daily/monthly, etc)

    f. Type of publication (nation/state)

    g. Mobile compatible or not

    h. Which fonts used for publishing

    i. Whether data is rendered flawlessly

    b) Offline analysis through e-publishers

    1. Survey through relevant contact person belongs to the category mentioned in the

    section 1.1

    2. Collect the information by filling questionnaire manually

  • Copyright@W3C India Page 19

    2 Offline survey

    1. Offline newspapers publishers

    2. Offline magazine publishers

    3. Offline course materials publishers (school, institute, college, etc)

    4. Survey through advertisement, email, telephone

    Outcome:

    I. Free survey for the identification of the sources.

    II. Survey forms (placed as Annexure I) shall be collected from different organizations for 12 major languages (Hindi, Bangla, Punjabi, Gujarati, Marathi, Malayalam, Tamil, Telugu, Assamese, Oriya, Kannada, and Manipuri) and other remaining languages data as per the availability shall be collected.

    III. Final report should be prepared to clearly bring out an objective and concrete outcomes so as to use the same for future actions.

    The final outcome should also help in the implementation of Indian languages text layout in the

    following areas:

    1. E-Publishing in Indian languages

    - Page Formats for Indian languages Documents.

    - Positioning of Running Heads and Page Numbers.

    - Positioning of Closing Brackets, Purnaviram at Line End

    - Vertical Writing Mode and Horizontal Writing Mode.

    - Paragraph Adjustment Rules.

    - Mixed Text Composition in Horizontal Writing Mode.

    - Mixed Text Composition in Vertical Writing Mode.

    2. CSS

    - First drop letter

    - Vertical & horizontal writing

    - Line breaking

    - Indentation

    - letter spacing

    - Underlining

    - CSS Embedded fonts

  • Copyright@W3C India Page 20

    Annexure I

    E-Publishing related questions

    A. General Questions:

    1) Does your organization work for India languages publishing?

    a. Yes b. No

    2) If no, then do you have any plans to localize your content in Indian Languages?

    2. Are you using e publishing? If so, how e- publishing supplements your publishing?

    a. Increase your circulation b. Increase revenue c. Advertisement only

    3. Does your organization involve in Indian languages translation services also?

    a. Yes b. No

    4. Which file format is most widely compatible in e-publishing?

    a. Doc

    b. PDf

    c. HTML

    d. Other (Specify)

    5. Which are Indian languages you are using in e-publishing and the corresponding script?

    6. 1) Are you using Unicode for content declaration?

    a. Yes b. No

    2) If no, which font are you using?

    7. Which Encoding are you using for saving the data?

    a. Unicode b. ISFOC c. Proprietary Font d. Others

  • Copyright@W3C India Page 21

    8. Does your organization works for Web development in Indian languages?

    a. Yes b. No

    9. Does your organization follow the rules which is mention below:

    Vertical writing and horizontal writing

    Line Breaking Rules

    Ruby and Emphasis Dots

    10. What are the proactive measures that your organization had taken up to avoid the bugs

    and problems regarding the picture clarity and the simplified use of script as well as

    language?

    11. How much space or memory you people uses on web server, so that your data can be

    easily retrieved over the network

    12. What format of the images does your organization uses for the images that you being

    publish in your paper?

    a. JPEG b. TIFF c. BMP d. Others

    13. Are there different formats for electronic/mobile publishing? a. Yes b. No

    14. Do your publication are suitable for mobile communication devices? a. Yes b. No

    15. What are some ePub compatible readers/devices? a. Sonys Reader (Touch Edition)

    b. PRS-505

    c. Apples iPhone

    d. Others

  • Copyright@W3C India Page 22

    16. What is your experience in mobile publishing?

    a. Satisfied b. Need improvement c. Unsatisfied d. Please

    specify...............................

    17. What are the trends of publishing?

    18. What type of problems you phase regarding publication?

    19. Frequency and Circulation of Publication?

    a. Daily b. Monthly c. Quarterly d. Others

    20. Which type of publication do you have?

    a. Nation wise b. State wise c. Others

    B. Questions related to Indic Text layout for EPUB

    Give the information of the following:-

    1. EPUB Content fidelity:-

    1.1 Size of the two Columns.

    1.2 Margins, padding, borders.

    1.3 Trim size and binding margins.

    1.4 Position of running head/page number

    1.5 Position of page number related to trim size

    1.6 Line Gap in horizontal writing mode

    2. Format of Table of Contents.

    3. How to process incomplete number of lines on a multi Column format Page?

    4. How to arrange table related 90 degrees Counterclockwise?

    5. How to arrange the lines contains multiple illustration/images?