12233-encoding-dup-indic.pdf
-
Upload
fastchennai -
Category
Documents
-
view
225 -
download
0
Transcript of 12233-encoding-dup-indic.pdf
-
7/28/2019 12233-encoding-dup-indic.pdf
1/3
CommentsonEncodingDuplicateIndicCharacters
VinodhRajan
ThisdocumentisinresponsetothefollowingrecentdocumentssubmittedtotheL2registry
authored by Shriramana Sharma Proposal to encode 0DF5Malayalam LetterArchaic II
(L2/12225)andProposaltoaddtwocharactersforBrahmi(L2/12226).
EncodingWrittenFormsasopposedtophonemesInL2/12225ShriramanaSharmarecommendsforencodinganarchaicformofMalayalam
Independentvowelletteru.ShriramanawritesUnicodeencodeswrittenformsandnot
thesounds thereof.Thesamemaxim isrepeatedatL2/12226aswell todisunifyTamil
specificBrahmicharactersLLA&VIRAMASIGN.Whilethestatementistrue,itmustalsobe
taken intoaccount, thatUnicodedoesntactually recommendencodingofall theglyphic
variantsofacharacteraswell.
TheIndicscriptsusuallyhavealternateindependentformsforseveralletters,whichcannot
bejustconsideredassimpleglyphicvariantsoftheirbaseforms.Itisnotpossibletoencode
allthosearchaicoralternateglyphsasindependentcharactersintheUCS.
Considerthebelowexamples:
Siddhami
u Devanagaria jha a Brahmi
L2/12-233
-
7/28/2019 12233-encoding-dup-indic.pdf
2/3
Tamil
u
MostprobablythislatteralternateformofBrahmi (whichwasmoreprevalentinSouthernIndia&SriLanka)wasthesourceofthecorrespondingalternate letterform inSiddham&
Malayalam.
Quitesurprisingly,L2/12226doesntseem to requestan independentcodepoint for this
sourcevariant.
The interesting case is that of Tamil u is the oft used form, while derivedindependentlyfromthecorrespondingshortvoweluisthealternateform.Themaxim Unicode encodeswrittenforms andnot the sounds thereof cannotbeoverappliedtoeveryoneoftheexamplesabove.Anysuchapplicationwillprobablyresultinthe
encodingofalltheabovesampleglyphicvariantsasseparatecharactersintheUCS.
Suchamoveisquiteunwarranted.Thiswouldsetaseriousprecedenttoencodeadozenor
more glyphic variants as independent characters in the UCS. Paleographic origins of the
lettersareimmaterialfortheprocessofencoding.
Ondualencodingof/ra/inBengaliBlock&DevanagariPrishtamatraE
U+09F0&U+09B0wereneededtorepresenttwodifferentmodernlanguagesinplain
text.AsforPrishtamatraEsincetherewerereorderingsinvolved,thissubsequentlyposed
problems for therenderingengines.Both thecasesarenotpreciseprecedents toencode
furtherduplicatecharacters.
ForArchaicMalayalamII boththealternateglyphsbelongtothesamelanguage.
For theBrahmi additions,Brahmiblock is itselfhighlyunified in theUCSwhich requires
extensive tailoring for its implementation. In all probability, the Brahmi font has to be
customizedforasinglevariantanditcannotincorporateallthemyriadvariantsfoundinthe
inscriptions.
BrahmiSpecificDisUnifications
JustafewBrahmicharacterscannotbedisunifiedcitingindependentpaleographicorigins.
It must be noted that independent Bhattiprolu variants are already unified with the
existingcharacters.Asnotedearlier,whileL2/12256requeststhedisunificationofLLAand
VIRAMA, but it doesnt seem to recommend the disunification of the independent
-
7/28/2019 12233-encoding-dup-indic.pdf
3/3
alternative .Therefore,anydisunification in theBrahmiblockneeds tobetakenonlyafterconsideringtheoverallunification/disunificationparadigmoftheentireblock.GraphemicSegmentationGraphemic Segmentation is quite complex for Indic scripts again requiring several
customizationsandisnotaverystrongargumenttodisunifytheVIRAMAforTamilBrahmi.
EvenforDevanagari(oranyotherIndicscript),therecannotbeasingleuniformgraphemic
segmentation. For the same word, //,
isoffivegraphemes,whileisoffourgraphemes,dependentontheconjunctbehavior. Therefore, even for the same word, behavior of the segmentation rules are
dictatedbythefont,andthereforemustbetakencareonlyattheapplicationlevel.
FontLevelHandlingAswithallglyphicvariations, thecharactersproposed inL2/12226&L2/12225mustbe
handledatthefontlevel.
Already, the Malayalam Classical font available as a part of Indolipi package
(http://www.aai.unihamburg.de/indtib/INDOLIPI/Indolipi.htm) has the alternate u at
U+0D08,andsupportsclassicalorthographyofMalayalamscriptsuchasextensiveconjunct
behavior,ligatedvowelsignsforU,UU,VocalicRandVocalicRRetc.
IncaseofBrahmi,asingle fontcannotpossiblysupportbothTamilvariantofBrahmiand
other Brahmi variants at the same time. As discussed earlier, the font needs to be
specifically tailored for theparticularBrahmivariant. In theabsenceof suchause case,
there are no possible issues in unifying the proposed new Brahmi characters with the
existingcharacters.
OpenTypesupportslanguagetags,andGraphitehasfeatureswhichcanbeenabled.Hence,
incasethereanyspecificcaseswhereboththevariantsmustbeusedinthesametext,the
abovecanbeharnessed.
Conclusion
Basedonalltheaforementionedarguments,theUTCshouldnotrecommendtheencoding
of the glyphic variants proposed in L2/12225 & L2/12226, and instead advice the
unificationofthecharacterswiththealreadyexistingcharacters,tobehandledatthefont
level.