12233-encoding-dup-indic.pdf

download 12233-encoding-dup-indic.pdf

of 3

Transcript of 12233-encoding-dup-indic.pdf

  • 7/28/2019 12233-encoding-dup-indic.pdf

    1/3

    CommentsonEncodingDuplicateIndicCharacters

    VinodhRajan

    [email protected]

    ThisdocumentisinresponsetothefollowingrecentdocumentssubmittedtotheL2registry

    authored by Shriramana Sharma Proposal to encode 0DF5Malayalam LetterArchaic II

    (L2/12225)andProposaltoaddtwocharactersforBrahmi(L2/12226).

    EncodingWrittenFormsasopposedtophonemesInL2/12225ShriramanaSharmarecommendsforencodinganarchaicformofMalayalam

    Independentvowelletteru.ShriramanawritesUnicodeencodeswrittenformsandnot

    thesounds thereof.Thesamemaxim isrepeatedatL2/12226aswell todisunifyTamil

    specificBrahmicharactersLLA&VIRAMASIGN.Whilethestatementistrue,itmustalsobe

    taken intoaccount, thatUnicodedoesntactually recommendencodingofall theglyphic

    variantsofacharacteraswell.

    TheIndicscriptsusuallyhavealternateindependentformsforseveralletters,whichcannot

    bejustconsideredassimpleglyphicvariantsoftheirbaseforms.Itisnotpossibletoencode

    allthosearchaicoralternateglyphsasindependentcharactersintheUCS.

    Considerthebelowexamples:

    Siddhami

    u Devanagaria jha a Brahmi

    L2/12-233

  • 7/28/2019 12233-encoding-dup-indic.pdf

    2/3

    Tamil

    u

    MostprobablythislatteralternateformofBrahmi (whichwasmoreprevalentinSouthernIndia&SriLanka)wasthesourceofthecorrespondingalternate letterform inSiddham&

    Malayalam.

    Quitesurprisingly,L2/12226doesntseem to requestan independentcodepoint for this

    sourcevariant.

    The interesting case is that of Tamil u is the oft used form, while derivedindependentlyfromthecorrespondingshortvoweluisthealternateform.Themaxim Unicode encodeswrittenforms andnot the sounds thereof cannotbeoverappliedtoeveryoneoftheexamplesabove.Anysuchapplicationwillprobablyresultinthe

    encodingofalltheabovesampleglyphicvariantsasseparatecharactersintheUCS.

    Suchamoveisquiteunwarranted.Thiswouldsetaseriousprecedenttoencodeadozenor

    more glyphic variants as independent characters in the UCS. Paleographic origins of the

    lettersareimmaterialfortheprocessofencoding.

    Ondualencodingof/ra/inBengaliBlock&DevanagariPrishtamatraE

    U+09F0&U+09B0wereneededtorepresenttwodifferentmodernlanguagesinplain

    text.AsforPrishtamatraEsincetherewerereorderingsinvolved,thissubsequentlyposed

    problems for therenderingengines.Both thecasesarenotpreciseprecedents toencode

    furtherduplicatecharacters.

    ForArchaicMalayalamII boththealternateglyphsbelongtothesamelanguage.

    For theBrahmi additions,Brahmiblock is itselfhighlyunified in theUCSwhich requires

    extensive tailoring for its implementation. In all probability, the Brahmi font has to be

    customizedforasinglevariantanditcannotincorporateallthemyriadvariantsfoundinthe

    inscriptions.

    BrahmiSpecificDisUnifications

    JustafewBrahmicharacterscannotbedisunifiedcitingindependentpaleographicorigins.

    It must be noted that independent Bhattiprolu variants are already unified with the

    existingcharacters.Asnotedearlier,whileL2/12256requeststhedisunificationofLLAand

    VIRAMA, but it doesnt seem to recommend the disunification of the independent

  • 7/28/2019 12233-encoding-dup-indic.pdf

    3/3

    alternative .Therefore,anydisunification in theBrahmiblockneeds tobetakenonlyafterconsideringtheoverallunification/disunificationparadigmoftheentireblock.GraphemicSegmentationGraphemic Segmentation is quite complex for Indic scripts again requiring several

    customizationsandisnotaverystrongargumenttodisunifytheVIRAMAforTamilBrahmi.

    EvenforDevanagari(oranyotherIndicscript),therecannotbeasingleuniformgraphemic

    segmentation. For the same word, //,

    isoffivegraphemes,whileisoffourgraphemes,dependentontheconjunctbehavior. Therefore, even for the same word, behavior of the segmentation rules are

    dictatedbythefont,andthereforemustbetakencareonlyattheapplicationlevel.

    FontLevelHandlingAswithallglyphicvariations, thecharactersproposed inL2/12226&L2/12225mustbe

    handledatthefontlevel.

    Already, the Malayalam Classical font available as a part of Indolipi package

    (http://www.aai.unihamburg.de/indtib/INDOLIPI/Indolipi.htm) has the alternate u at

    U+0D08,andsupportsclassicalorthographyofMalayalamscriptsuchasextensiveconjunct

    behavior,ligatedvowelsignsforU,UU,VocalicRandVocalicRRetc.

    IncaseofBrahmi,asingle fontcannotpossiblysupportbothTamilvariantofBrahmiand

    other Brahmi variants at the same time. As discussed earlier, the font needs to be

    specifically tailored for theparticularBrahmivariant. In theabsenceof suchause case,

    there are no possible issues in unifying the proposed new Brahmi characters with the

    existingcharacters.

    OpenTypesupportslanguagetags,andGraphitehasfeatureswhichcanbeenabled.Hence,

    incasethereanyspecificcaseswhereboththevariantsmustbeusedinthesametext,the

    abovecanbeharnessed.

    Conclusion

    Basedonalltheaforementionedarguments,theUTCshouldnotrecommendtheencoding

    of the glyphic variants proposed in L2/12225 & L2/12226, and instead advice the

    unificationofthecharacterswiththealreadyexistingcharacters,tobehandledatthefont

    level.