Vol. 2, No. 2 ------------ < for cleo community> --------- March 14, 1993


Notes from the editor..............Section 1/4

ISO................................Section 2/4

Unicode on Ethiopic................Section 3/4

Section 1/4

Notes from the Editor

###Ethio Science & Technology### is a weekly column presented to the Cleo community. It covers various issues in the science; in particular, those of more related to Ethiopia.

The last issue of ###Ethio Science & Technology### had raised issues in regard to standardization of Ethiopic. In today's column, ISO, an international organization is introduced; and a very short summary of Unicode technical report on Ethiopic is presented.

My view on the technical report will follow in the next column. I felt it would be too boring and cumbersome to the readers if I add it here. Let alone that, I am not even sure if this column is not excessively technical hindering the underlying messages. I believe this is an important issue to everyone and if not all, at least most of us should involve to learn the problem and do something about it.

If you haven't seen the last week's column, it would be helpful to look that before hand.

Some important terms:

ISO: The International Organization for Standardization

ISO-10646: An official ISO standard for the written form all languages.

Ethiopic: The Ge'ez alphabet used by Ethiopian languages such as Amharic, Oromegna, Tigregna, and others.

symbol : A description of figure that represent sound. Example, "a", "b", and "c".

code: A unique digit assigned to a symbol (a character) in the form of DECIMAL, OCTAL, and HEXADECIMAL.

ASCII : American Standard Code for Information Interchange.

Unicode: Unicode Organization.

Section 2/4

ISO (International Standardization Organization)

ISO, the International Organization for Standardization, located in Geneva, is the highest international body in bringing national standards and incorporating them into an international standard. Several nations have their own organizations which setup national standards. For instance, the following are some of the organizations that serve their own respective nations in establishing standards.

American National Standards Institute (ANSI) USA

Deutsches Institut fuer Normung (DIN) Germany

British Standards Institution (BSI) United Kingdom

Association francaise de normalisation (AFNOR) France

Ente Nazionale Italiano di Unificatione (UNI) Italy

Among the many standards setup by ISO, ISO-10646 is one of them. If I am allowed to quote Glen Adams, here is what he had to say about 10646:

10646 is now an official ISO standard which defines a character repertoire *aimed* at covering the written form of all languages. [Some languages cannot be represented yet since the scripts used to write them have yet to be incorporated into 10646; namely, the writing systems based on the Burmese, Ethiopic, Khmer, Mongolian, Sinhalese, and Tibetan scripts. These are planned for a future version along with other archaic and less-used scripts. ]

Yes, Ethiopic is under consideration. Since both ISO and Unicode have agreed that the Unicode 16 bit encoding is to become a subset of ISO-10646 standard, the current draft proposal by Unicode technical committee has a bearing effect. In short, it is not plausible to assume that by the time when Unicode approves its current drafted proposal, the work will become part of an international standard. All ISO published work can be purchased from:

Phillips Business Information

7811 Montrose Rd.

Potomac, MD 20854

Tel. 1-800-OMNICOM and (301)-424-3338

Section 3/4

Unicode on Ethiopic (Ethiopian Script)

The Unicode technical committee has already presented its draft proposal on Ethiopic and is under review until August 15, 1993. The report can be obtained via FTP from ``Unicode.org'' or if you don't have access to ftp capability, please don't hesitate from asking a copy--I can mail you one!

The proposal on Ethiopic is written by Joe Becker. It has been reviewed internally and it is the view of the committee that the report is somehow stable even though some questions and problems remain unsolved.


Before addressing the codepoints of Ethiopic in the body of Unicode, let's review 16 bit encoding. You remember that ASCII uses 8 bit (one byte) to represent symbols or characters. Unlike ASCII, Unicode uses 16 bit (two byte) to represent characters. What does this mean? Contrary to ASCII, which only allows 256 symbols in one set, Unicode 16 bit encoding can accommodate about 65,000 symbols.

One of Unicode's objectives is to include all written form of languages in its standard. For this reason, it has to assign for such languages blocks which would be used to represent symbols. Unicode uses hexadecimal (base 16 numbering system) for its encoding. According to the report, the code points assigned for Ethiopic is:

"Ethiopian U+1200 U+125F.'', the same as from 1200-125F (hexadecimal), or the same as from 4096-4191 (decimal)

One can only represent 95 symbols in this block!!!!

Encoding Ethiopic

As you remember, our alphabet has more than 33 basic letters, and 6 vowel syllables for each letter. That would bring the number of characters above 231. This doesn't include numbers, punctuation marks, and ``diqala fid'lat.''

The report defines 37 basic letters, which it calls them glyph forms; 13 vowel phonetic letters; 20 numbers; and 7 punctuation marks. It doesn't recognize those syllables beyond the basic letters as an independent entity (glyph.) In other words, it doesn't assign code to them; instead, it assumes those 13 vowel phonetic letters to constitute the syllables.

For instance, if you type "M", you get "m",

if you type "MA", you get "ma",

if you type "MAE", you get "mi", and so on.

Notice here, "M", "A", and "E" have their own unique code, but "ma", and "mi" do not. It assumed that any software that implements Ethiopic should adhere to such scheme.

If you are familiar with Ethiopian typewriter, the concept is exactly the same. To type "c", in Ethiopian typewriter, first you must type "b", then bar to make "c". The letter you see on the typewriter key is "b" not "c".

Encoding Order (kid-em teq-tel)

The report adopts the current order with slight changes.

--to be continued

Abass Belay Alamnehe /