History

QurASCII-Transliteration - Development History

The idea for the QurAscii transliteration came up in course of the transformation of the Tanzil Quran (which was published on the internet in 2007) into a relational Access database during Ramadan 2008.

The goal was, to only use ASCII characters ('Basic Latin' Block '0000' to '007F' of Unicode), in order to create a fully reversible transliteration of the holy quran. As the chracters '0000' trough '0020' and '007F' are used for control characters and '0030' through '0039' for numbers, 84 ASCII characters remained for this task.

In a first approach the Buckwalter Transliteration was used. But as the Buckwalter transliteration had some defizits in relation to modern html/xml representation of arabic text and a lot of special characters/diacrits, which are used in the holy Quran, were missing, a slightly different approach had to be selected.

Basic idea was to use only ASCII consonants (and no special characters) to transliterate the arabic consonants (except alif which is treated as a carrier and vocal extension). For diacritic vowels also the related ASCII vowels were used. The remaining vowels were distributed to 'Alif Madda', 'Alif Wasla' und 'Alif Maksura'. For plain 'Alif' the exclamation mark was used as it is of similar shape.

The resulting deviations from the Buckwalter Transliteration are listed in the following table:

ArabicBuckwalterQurASCIIExplanation
ء'@HTML reserved character; @ is simmilar to a
آ|Ospecial character - Long Alif => O
أ<%HTML reserved character
ؤ&WHTML reserved character => W
إ>,HTML reserved character
ئ}Y=> Y (as Waw with Hamza above)
اA!as A is used for Fathatan; ! has similar shape
ذ*gg as last availlable small consonant
ش$cc as in German sch
غgGuse of Capital Letter G was more appropiate
ىyee as last remaining small vocal
يYysmall y as in DIN & ISO
ًFAcapital A for an
ٌNUcapital U for un
ٍKIcapital I for in
ّ~=douplication
ْo`absence of vowel = accent
ٰ`-elongation
ٱ{ovoiceless Alif => o

As manual transliteration in a database of course was not feasible, I decided to teach myself the open source programming language Perl which seemed to be the best choice for the task of textual analysis and conversion.

After longer testing and optimizing the handling of the Quran data, through object oriented programming, in 2011 I finally was able to launch the first version of my 'Quran-Explorer' comprising a full cross-reference of used words and a rudimentary search function.

In order to optimize the search, the following modification was done to the original transliteration:

ArabicBeforeAfterExplanation
ّ=:in order to avoid the form control symbol =

Additionally also the 'Combining Diacritical Marks' (Unicode range '0653' to '0655') and the 'Quranic Annotation Sings' (Unicode range '06D6' to '06ED') were assigned as per following table:

ArabicQurASCIIExplanation
ٓ~same shape
ٔ^
ٕ=
ۖ<pause mark => used HTML reserved character
ۗ>pause mark => used HTML reserved character
ۘ;pause mark => used special character
ۙ&pause mark => used HTML reserved character
ۚ'pause mark => used HTML reserved character
ۛ"pause mark => used HTML reserved character
ۜJ
۝.verse mark => used special character
۞#verse mark => used special character
۟R
۠Q
ۡ*
ۢM
ۣK
ۤ?
ۥV
ۦX
ۧ+
ۨN
۩Pprostration mark
۪B
۫C
۬F
ۭL

In 2013 the fulltext seach for the Quran was optimized. During that time I identified problems with the letters '?' and '^' which are metacharacters in Perl's regular expressions. To enable searching for all characters which are used in Tanzil Quran the following additional modifications had to be done:

ArabicBeforeAfterExplanation
ٔ^{to avoid control character for regular expression
ٕ=}moved according to other 'Hamza'
ۤ?=to avoid control character for regular expression

In the following the resulting overall QurASCII-table for the assignment of all ASCII characters '0000' through '007F' to the related Arabic characters is given:

UCASCIIARABIC CHARACTERUCUNICODE NAME
0000NULNUL0000NULL
0001SOHSOH0001START OF HEADING
0002STXSTX0002START OF TEXT
0003ETXETX0003END OF TEXT
0004EOTEOT0004END OF TRANSMISSION
0005ENQENQ0005ENQUIRY
0006ACKACK0006ACKNOWLEDGE
0007BELBEL0007BELL
0008BSBS0008BACKSPACE
0009HTHT0009CHARACTER TABULATION
000ALFLF000ALINE FEED
000BVTVT000BLINE TABULATION
000CFFFF000CFORM FEED
000DCRCR000DCARRIAGE RETURN
000ESOSO000ESHIFT OUT
000FSISI000FSHIFT IN
0010DLEDLE0010DATA LINK ESCAPE
0011DC1DC10011DEVICE CONTROL ONE
0012DC2DC20012DEVICE CONTROL TWO
0013DC3DC30013DEVICE CONTROL THREE
0014DC4DC40014DEVICE CONTROL FOUR
0015NAKNAK0015NEGATIVE ACKNOWLEDGE
0016SYNSYN0016SYNCHRONOUS IDLE
0017ETBETB0017END OF TRANSMISSION BLOCK
0018CANCAN0018CANCEL
0019EMEM0019END OF MEDIUM
001ASUBSUB001ASUBSTITUTE
001BESCESC001BESCAPE
001CFSFS001CINFORMATION SEPARATOR FOUR
001DGSGS001DINFORMATION SEPARATOR THREE
001ERSRS001EINFORMATION SEPARATOR TWO
001FUSUS001FINFORMATION SEPARATOR ONE
0020SPSP0020SPACE
0021!ا0627ARABIC LETTER ALEF
0022"ۚ06DAARABIC SMALL HIGH JEEM
0023#۞06DEARABIC START OF RUB EL HIZB
0024$-not used
0025%أ0623ARABIC LETTER ALEF WITH HAMZA ABOVE
0026&ۙ06D9ARABIC SMALL HIGH LAM ALEF
0027'ۛ06DBARABIC SMALL HIGH THREE DOTS
0028(-not used
0029)-not used
002A*ۡ06E1ARABIC SMALL HIGH DOTLESS HEAD OF KHAH
002B+ۧ06E7ARABIC SMALL HIGH YEH
002C,إ0625ARABIC LETTER ALEF WITH HAMZA BELOW
002D-ٰ0670ARABIC LETTER SUPERSCRIPT ALEF
002E.۝06DDARABIC END OF AYAH
002F/-not used
00300٠0660ARABIC-INDIC DIGIT ZERO
00311١0661ARABIC-INDIC DIGIT ONE
00322٢0662ARABIC-INDIC DIGIT TWO
00333٣0663ARABIC-INDIC DIGIT THREE
00344٤0664ARABIC-INDIC DIGIT FOUR
00355٥0665ARABIC-INDIC DIGIT FIVE
00366٦0666ARABIC-INDIC DIGIT SIX
00377٧0667ARABIC-INDIC DIGIT SEVEN
00388٨0668ARABIC-INDIC DIGIT EIGHT
00399٩0669ARABIC-INDIC DIGIT NINE
003A:ّ0651ARABIC SHADDA
003B;ۘ06D8ARABIC SMALL HIGH MEEM INITIAL FORM
003C<ۖ06D6ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA
003D=ۤ06E4
003E>ۗ06D7ARABIC SMALL HIGH LIGATURE QAF WITH LAM WITH ALEF MAKSURA
003F?-not used
0040@ء0621ARABIC LETTER HAMZA
0041Aً064BARABIC FATHATAN
0042B۪06EAARABIC EMPTY CENTRE LOW STOP
0043C۫06EBARABIC EMPTY CENTRE HIGH STOP
0044Dض0636ARABIC LETTER DAD
0045Eع0639ARABIC LETTER AIN
0046F۬06ECARABIC ROUNDED HIGH STOP WITH FILLED CENTRE
0047Gغ063AARABIC LETTER GHAIN
0048Hح062DARABIC LETTER HAH
0049Iٍ064DARABIC KASRATAN
004AJۜ06DCARABIC SMALL HIGH SEEN
004BKۣ06E3ARABIC SMALL LOW SEEN
004CLۭ06EDARABIC SMALL LOW MEEM
004DMۢ06E2ARABIC SMALL HIGH MEEM ISOLATED FORM
004ENۨ06E8ARABIC SMALL HIGH NOON
004FOآ0622ARABIC LETTER ALEF WITH MADDA ABOVE
0050P۩06E9ARABIC PLACE OF SAJDAH
0051Q۠06E0ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO
0052R۟06DFARABIC SMALL HIGH ROUNDED ZERO
0053Sص0635ARABIC LETTER SAD
0054Tط0637ARABIC LETTER TAH
0055Uٌ064CARABIC DAMMATAN
0056Vـۥۥ06E5ARABIC SMALL WAW
0057Wؤ0624ARABIC LETTER WAW WITH HAMZA ABOVE
0058Xـۦ06E6ARABIC SMALL YEH
0059Yئ0626ARABIC LETTER YEH WITH HAMZA ABOVE
005AZظ0638ARABIC LETTER ZAH
005B[-not used
005C\-not used
005D]-not used
005E^-not used
005F_ـ0640ARABIC TATWEEL
0060`ْ0652ARABIC SUKUN
0061aَ064EARABIC FATHA
0062bب0628ARABIC LETTER BEH
0063cش0634ARABIC LETTER SHEEN
0064dد062FARABIC LETTER DAL
0065eى0649ARABIC LETTER ALEF MAKSURA
0066fف0641ARABIC LETTER FEH
0067gذ0630ARABIC LETTER THAL
0068hه0647ARABIC LETTER HEH
0069iِ0650ARABIC KASRA
006Ajج062CARABIC LETTER JEEM
006Bkك0643ARABIC LETTER KAF
006Clل0644ARABIC LETTER LAM
006Dmم0645ARABIC LETTER MEEM
006Enن0646ARABIC LETTER NOON
006Foٱ0671ARABIC LETTER ALEF WASLA
0070pة0629ARABIC LETTER TEH MARBUTA
0071qق0642ARABIC LETTER QAF
0072rر0631ARABIC LETTER REH
0073sس0633ARABIC LETTER SEEN
0074tت062AARABIC LETTER TEH
0075uُ064FARABIC DAMMA
0076vث062BARABIC LETTER THEH
0077wو0648ARABIC LETTER WAW
0078xخ062EARABIC LETTER KHAH
0079yي064AARABIC LETTER YEH
007Azز0632ARABIC LETTER ZAIN
007B{ٔ0654ARABIC HAMZA ABOVE
007C|-not used
007D}ٕ0655ARABIC HAMZA BELOW
007E~ٓ0653ARABIC MADDAH ABOVE
007FDELDEL007FDELETE
COLOR CODING LEGEND
ASCII control characters
HTML/XML reserved characters
Metacharacter in Perl regular expression
Consistent with UNGEGN, ALA-LC and mostly also ISO and DIN (except j for ج )
Capital letters for "Tanween", "Hamza Carriers" and similar (dark) letters
As used in Buckwalter transliteration
Assignment of remaining letters
Quranic annotation signs (if used)
Pause marks and verse signs (if used)
Characters/diacrits not used in Tanzil Quran