The idea for the QurAscii transliteration came up in course of the transformation of the Tanzil Quran (which was published on the internet in 2007) into a relational Access database during Ramadan 2008.
The goal was, to only use ASCII characters ('Basic Latin' Block '0000' to '007F' of Unicode), in order to create a fully reversible transliteration of the holy quran. As the chracters '0000' trough '0020' and '007F' are used for control characters and '0030' through '0039' for numbers, 84 ASCII characters remained for this task.
In a first approach the Buckwalter Transliteration was used. But as the Buckwalter transliteration had some defizits in relation to modern html/xml representation of arabic text and a lot of special characters/diacrits, which are used in the holy Quran, were missing, a slightly different approach had to be selected.
Basic idea was to use only ASCII consonants (and no special characters) to transliterate the arabic consonants (except alif which is treated as a carrier and vocal extension). For diacritic vowels also the related ASCII vowels were used. The remaining vowels were distributed to 'Alif Madda', 'Alif Wasla' und 'Alif Maksura'. For plain 'Alif' the exclamation mark was used as it is of similar shape.
The resulting deviations from the Buckwalter Transliteration are listed in the following table:
Arabic | Buckwalter | QurASCII | Explanation |
---|---|---|---|
ء | ' | @ | HTML reserved character; @ is simmilar to a |
آ | | | O | special character - Long Alif => O |
أ | < | % | HTML reserved character |
ؤ | & | W | HTML reserved character => W |
إ | > | , | HTML reserved character |
ئ | } | Y | => Y (as Waw with Hamza above) |
ا | A | ! | as A is used for Fathatan; ! has similar shape |
ذ | * | g | g as last availlable small consonant |
ش | $ | c | c as in German sch |
غ | g | G | use of Capital Letter G was more appropiate |
ى | y | e | e as last remaining small vocal |
ي | Y | y | small y as in DIN & ISO |
ً | F | A | capital A for an |
ٌ | N | U | capital U for un |
ٍ | K | I | capital I for in |
ّ | ~ | = | douplication |
ْ | o | ` | absence of vowel = accent |
ٰ | ` | - | elongation |
ٱ | { | o | voiceless Alif => o |
As manual transliteration in a database of course was not feasible, I decided to teach myself the open source programming language Perl which seemed to be the best choice for the task of textual analysis and conversion.
After longer testing and optimizing the handling of the Quran data, through object oriented programming, in 2011 I finally was able to launch the first version of my 'Quran-Explorer' comprising a full cross-reference of used words and a rudimentary search function.
In order to optimize the search, the following modification was done to the original transliteration:
Arabic | Before | After | Explanation |
---|---|---|---|
ّ | = | : | in order to avoid the form control symbol = |
Additionally also the 'Combining Diacritical Marks' (Unicode range '0653' to '0655') and the 'Quranic Annotation Sings' (Unicode range '06D6' to '06ED') were assigned as per following table:
Arabic | QurASCII | Explanation |
---|---|---|
ٓ | ~ | same shape |
ٔ | ^ | |
ٕ | = | |
ۖ | < | pause mark => used HTML reserved character |
ۗ | > | pause mark => used HTML reserved character |
ۘ | ; | pause mark => used special character |
ۙ | & | pause mark => used HTML reserved character |
ۚ | ' | pause mark => used HTML reserved character |
ۛ | " | pause mark => used HTML reserved character |
ۜ | J | |
| . | verse mark => used special character |
۞ | # | verse mark => used special character |
۟ | R | |
۠ | Q | |
ۡ | * | |
ۢ | M | |
ۣ | K | |
ۤ | ? | |
ۥ | V | |
ۦ | X | |
ۧ | + | |
ۨ | N | |
۩ | P | prostration mark |
۪ | B | |
۫ | C | |
۬ | F | |
ۭ | L |
In 2013 the fulltext seach for the Quran was optimized. During that time I identified problems with the letters '?' and '^' which are metacharacters in Perl's regular expressions. To enable searching for all characters which are used in Tanzil Quran the following additional modifications had to be done:
Arabic | Before | After | Explanation |
---|---|---|---|
ٔ | ^ | { | to avoid control character for regular expression |
ٕ | = | } | moved according to other 'Hamza' |
ۤ | ? | = | to avoid control character for regular expression |
In the following the resulting overall QurASCII-table for the assignment of all ASCII characters '0000' through '007F' to the related Arabic characters is given:
UC | ASCII | ARABIC CHARACTER | UC | UNICODE NAME | ||||
---|---|---|---|---|---|---|---|---|
0000 | NUL | NUL | 0000 | NULL | ||||
0001 | SOH | SOH | 0001 | START OF HEADING | ||||
0002 | STX | STX | 0002 | START OF TEXT | ||||
0003 | ETX | ETX | 0003 | END OF TEXT | ||||
0004 | EOT | EOT | 0004 | END OF TRANSMISSION | ||||
0005 | ENQ | ENQ | 0005 | ENQUIRY | ||||
0006 | ACK | ACK | 0006 | ACKNOWLEDGE | ||||
0007 | BEL | BEL | 0007 | BELL | ||||
0008 | BS | BS | 0008 | BACKSPACE | ||||
0009 | HT | HT | 0009 | CHARACTER TABULATION | ||||
000A | LF | LF | 000A | LINE FEED | ||||
000B | VT | VT | 000B | LINE TABULATION | ||||
000C | FF | FF | 000C | FORM FEED | ||||
000D | CR | CR | 000D | CARRIAGE RETURN | ||||
000E | SO | SO | 000E | SHIFT OUT | ||||
000F | SI | SI | 000F | SHIFT IN | ||||
0010 | DLE | DLE | 0010 | DATA LINK ESCAPE | ||||
0011 | DC1 | DC1 | 0011 | DEVICE CONTROL ONE | ||||
0012 | DC2 | DC2 | 0012 | DEVICE CONTROL TWO | ||||
0013 | DC3 | DC3 | 0013 | DEVICE CONTROL THREE | ||||
0014 | DC4 | DC4 | 0014 | DEVICE CONTROL FOUR | ||||
0015 | NAK | NAK | 0015 | NEGATIVE ACKNOWLEDGE | ||||
0016 | SYN | SYN | 0016 | SYNCHRONOUS IDLE | ||||
0017 | ETB | ETB | 0017 | END OF TRANSMISSION BLOCK | ||||
0018 | CAN | CAN | 0018 | CANCEL | ||||
0019 | EM | EM | 0019 | END OF MEDIUM | ||||
001A | SUB | SUB | 001A | SUBSTITUTE | ||||
001B | ESC | ESC | 001B | ESCAPE | ||||
001C | FS | FS | 001C | INFORMATION SEPARATOR FOUR | ||||
001D | GS | GS | 001D | INFORMATION SEPARATOR THREE | ||||
001E | RS | RS | 001E | INFORMATION SEPARATOR TWO | ||||
001F | US | US | 001F | INFORMATION SEPARATOR ONE | ||||
0020 | SP | SP | 0020 | SPACE | ||||
0021 | ! | ا | 0627 | ARABIC LETTER ALEF | ||||
0022 | " | ۚ | 06DA | ARABIC SMALL HIGH JEEM | ||||
0023 | # | ۞ | 06DE | ARABIC START OF RUB EL HIZB | ||||
0024 | $ | - | not used | |||||
0025 | % | أ | 0623 | ARABIC LETTER ALEF WITH HAMZA ABOVE | ||||
0026 | & | ۙ | 06D9 | ARABIC SMALL HIGH LAM ALEF | ||||
0027 | ' | ۛ | 06DB | ARABIC SMALL HIGH THREE DOTS | ||||
0028 | ( | - | not used | |||||
0029 | ) | - | not used | |||||
002A | * | ۡ | 06E1 | ARABIC SMALL HIGH DOTLESS HEAD OF KHAH | ||||
002B | + | ۧ | 06E7 | ARABIC SMALL HIGH YEH | ||||
002C | , | إ | 0625 | ARABIC LETTER ALEF WITH HAMZA BELOW | ||||
002D | - | ٰ | 0670 | ARABIC LETTER SUPERSCRIPT ALEF | ||||
002E | . | | 06DD | ARABIC END OF AYAH | ||||
002F | / | - | not used | |||||
0030 | 0 | ٠ | 0660 | ARABIC-INDIC DIGIT ZERO | ||||
0031 | 1 | ١ | 0661 | ARABIC-INDIC DIGIT ONE | ||||
0032 | 2 | ٢ | 0662 | ARABIC-INDIC DIGIT TWO | ||||
0033 | 3 | ٣ | 0663 | ARABIC-INDIC DIGIT THREE | ||||
0034 | 4 | ٤ | 0664 | ARABIC-INDIC DIGIT FOUR | ||||
0035 | 5 | ٥ | 0665 | ARABIC-INDIC DIGIT FIVE | ||||
0036 | 6 | ٦ | 0666 | ARABIC-INDIC DIGIT SIX | ||||
0037 | 7 | ٧ | 0667 | ARABIC-INDIC DIGIT SEVEN | ||||
0038 | 8 | ٨ | 0668 | ARABIC-INDIC DIGIT EIGHT | ||||
0039 | 9 | ٩ | 0669 | ARABIC-INDIC DIGIT NINE | ||||
003A | : | ّ | 0651 | ARABIC SHADDA | ||||
003B | ; | ۘ | 06D8 | ARABIC SMALL HIGH MEEM INITIAL FORM | ||||
003C | < | ۖ | 06D6 | ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA | ||||
003D | = | ۤ | 06E4 | |||||
003E | > | ۗ | 06D7 | ARABIC SMALL HIGH LIGATURE QAF WITH LAM WITH ALEF MAKSURA | ||||
003F | ? | - | not used | |||||
0040 | @ | ء | 0621 | ARABIC LETTER HAMZA | ||||
0041 | A | ً | 064B | ARABIC FATHATAN | ||||
0042 | B | ۪ | 06EA | ARABIC EMPTY CENTRE LOW STOP | ||||
0043 | C | ۫ | 06EB | ARABIC EMPTY CENTRE HIGH STOP | ||||
0044 | D | ض | 0636 | ARABIC LETTER DAD | ||||
0045 | E | ع | 0639 | ARABIC LETTER AIN | ||||
0046 | F | ۬ | 06EC | ARABIC ROUNDED HIGH STOP WITH FILLED CENTRE | ||||
0047 | G | غ | 063A | ARABIC LETTER GHAIN | ||||
0048 | H | ح | 062D | ARABIC LETTER HAH | ||||
0049 | I | ٍ | 064D | ARABIC KASRATAN | ||||
004A | J | ۜ | 06DC | ARABIC SMALL HIGH SEEN | ||||
004B | K | ۣ | 06E3 | ARABIC SMALL LOW SEEN | ||||
004C | L | ۭ | 06ED | ARABIC SMALL LOW MEEM | ||||
004D | M | ۢ | 06E2 | ARABIC SMALL HIGH MEEM ISOLATED FORM | ||||
004E | N | ۨ | 06E8 | ARABIC SMALL HIGH NOON | ||||
004F | O | آ | 0622 | ARABIC LETTER ALEF WITH MADDA ABOVE | ||||
0050 | P | ۩ | 06E9 | ARABIC PLACE OF SAJDAH | ||||
0051 | Q | ۠ | 06E0 | ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO | ||||
0052 | R | ۟ | 06DF | ARABIC SMALL HIGH ROUNDED ZERO | ||||
0053 | S | ص | 0635 | ARABIC LETTER SAD | ||||
0054 | T | ط | 0637 | ARABIC LETTER TAH | ||||
0055 | U | ٌ | 064C | ARABIC DAMMATAN | ||||
0056 | V | ـۥۥ | 06E5 | ARABIC SMALL WAW | ||||
0057 | W | ؤ | 0624 | ARABIC LETTER WAW WITH HAMZA ABOVE | ||||
0058 | X | ـۦ | 06E6 | ARABIC SMALL YEH | ||||
0059 | Y | ئ | 0626 | ARABIC LETTER YEH WITH HAMZA ABOVE | ||||
005A | Z | ظ | 0638 | ARABIC LETTER ZAH | ||||
005B | [ | - | not used | |||||
005C | \ | - | not used | |||||
005D | ] | - | not used | |||||
005E | ^ | - | not used | |||||
005F | _ | ـ | 0640 | ARABIC TATWEEL | ||||
0060 | ` | ْ | 0652 | ARABIC SUKUN | ||||
0061 | a | َ | 064E | ARABIC FATHA | ||||
0062 | b | ب | 0628 | ARABIC LETTER BEH | ||||
0063 | c | ش | 0634 | ARABIC LETTER SHEEN | ||||
0064 | d | د | 062F | ARABIC LETTER DAL | ||||
0065 | e | ى | 0649 | ARABIC LETTER ALEF MAKSURA | ||||
0066 | f | ف | 0641 | ARABIC LETTER FEH | ||||
0067 | g | ذ | 0630 | ARABIC LETTER THAL | ||||
0068 | h | ه | 0647 | ARABIC LETTER HEH | ||||
0069 | i | ِ | 0650 | ARABIC KASRA | ||||
006A | j | ج | 062C | ARABIC LETTER JEEM | ||||
006B | k | ك | 0643 | ARABIC LETTER KAF | ||||
006C | l | ل | 0644 | ARABIC LETTER LAM | ||||
006D | m | م | 0645 | ARABIC LETTER MEEM | ||||
006E | n | ن | 0646 | ARABIC LETTER NOON | ||||
006F | o | ٱ | 0671 | ARABIC LETTER ALEF WASLA | ||||
0070 | p | ة | 0629 | ARABIC LETTER TEH MARBUTA | ||||
0071 | q | ق | 0642 | ARABIC LETTER QAF | ||||
0072 | r | ر | 0631 | ARABIC LETTER REH | ||||
0073 | s | س | 0633 | ARABIC LETTER SEEN | ||||
0074 | t | ت | 062A | ARABIC LETTER TEH | ||||
0075 | u | ُ | 064F | ARABIC DAMMA | ||||
0076 | v | ث | 062B | ARABIC LETTER THEH | ||||
0077 | w | و | 0648 | ARABIC LETTER WAW | ||||
0078 | x | خ | 062E | ARABIC LETTER KHAH | ||||
0079 | y | ي | 064A | ARABIC LETTER YEH | ||||
007A | z | ز | 0632 | ARABIC LETTER ZAIN | ||||
007B | { | ٔ | 0654 | ARABIC HAMZA ABOVE | ||||
007C | | | - | not used | |||||
007D | } | ٕ | 0655 | ARABIC HAMZA BELOW | ||||
007E | ~ | ٓ | 0653 | ARABIC MADDAH ABOVE | ||||
007F | DEL | DEL | 007F | DELETE | ||||
COLOR CODING LEGEND | ||||||||
ASCII control characters | ||||||||
HTML/XML reserved characters | ||||||||
Metacharacter in Perl regular expression | ||||||||
Consistent with UNGEGN, ALA-LC and mostly also ISO and DIN (except j for ج ) | ||||||||
Capital letters for "Tanween", "Hamza Carriers" and similar (dark) letters | ||||||||
As used in Buckwalter transliteration | ||||||||
Assignment of remaining letters | ||||||||
Quranic annotation signs (if used) | ||||||||
Pause marks and verse signs (if used) | ||||||||
Characters/diacrits not used in Tanzil Quran |