Vol. 2, No. 1 ------------ < for cleo community> --------- March 7, 1993


Notes from the editor..............Section 1/4

Symbols In Computer Eyes...........Section 2/4

ASCII coding scheme................Section 3/4

Unicode............................Section 4/4

Section 1/4

Notes from the Editor

###Ethio Science & Technology### is a weekly column presented to the Cleo community. It covers various issues in the science; in particular, those of more related to Ethiopia. The column was absent from Cleo for about two months due to lack of time.

The last issue of ###Ethio Science & Technology### had two articles that were left to be continued. Today, instead of continuing those, this column addresses what seems to be an important, but loosely formal, project--establishing standard code for Ethiopic. It introduces a principle of symbol representation by computers; reviews ASCII, the de facto standard for representing symbols; and brings Unicode to the reader's attention, an organization that was formed to bring about an industry standard.

Some important terms:

Ethiopic: The Ge'ez alphabet used by Ethiopian languages such as Amharic, Oromigna, Tigrigna, and others.

symbol : A description of figure that represents sound. Example, "a", "b", and "c".

code: A unique digit assigned to a symbol (a character) in the form of DECIMAL, OCTAL, and HEXADECIMAL.

ASCII: American Standard Code for Information Interchange.

Unicode: Unicode Organization.

Section 2/4

Symbols In Computer Eyes

We look at symbols and translate them into something to which they are related to, such as sounds, messages, directions, orders, and so on. A symbol is identified from its family by its shape, form, and position. For us, this might be quite enough to recognize a symbol and comprehend its meaning, but for computers this is not necessarily true. For computers, a symbol is simply a set of data that is kept on storage device which might be a disk, tape, or RAM. Well, what does this gross statement is trying to say?

When you punch in a key, let's say "a", the keyboard sends a signal to the computer brain; in turn, the computer brain goes to the storage where the symbols are located and displays the symbol "a" on the screen. Without going into detail, it is this concept that is used by the programmers to replace a resident screen symbols by newly designed one including Ethiopic.

Every key on a keyboard including the invisible one is associated with a defined signal which we might think of as a code. The computer discern one symbol from another by the signals sent from a keyboard. In other words, when we communicate with computers, we use a well defined symbols--alphabet, digits, punctuation marks, graphics-symbols, and signs that are associated with signals.

Section 3/4

ASCII Coding Scheme

ASCII is the American Standard Code for Information Interchange. To this end, it is a de facto standard for the industry. ASCII defines 256 symbols including alphabet, digits, punctuation marks, graphics-symbols, and signs. Of course, you don't see all the symbols on your keyboard since your keyboard may not be consists of at most more than 130 keys.

Necessarily, all software don't implement the full capacity of ASCII. There are some who restrict themselves on the first 128 ASCII symbols and some who stretch to the 256 ASCII code. All the symbols on your keyboard are found in the range of 0-127.

ASCII code can be represented in different numbering systems: decimal, binary, octal, and hexadecimal. For instance,


A -------- 65 -------- 01000001 -----------101 ----------------- 40

B -------- 66 -------- 01000010 ------------102 ---------------- 41

C -------- 67 -------- 01000011 -----------103 ---------------- 42

Technically speaking, ASCII assigns 8 bit or one BYTE to represent a character (a symbol); therefor, the limits 256 comes from this. We can't assign ASCII code for more than 256 symbols with in one set, and yet we can have a number of ASCII set that consists of 256 symbols.

Section 4/4


As you all noticed, the term "Unicode" has started flying on Cleo. Unless we examine it very closely, it may become a ghost.

For non-English speaking nations, ASCII is not suitable as much as it is to the de facto standard holders. Even worse, it is a sever limit to the languages such as Japanse and Chinese. For instance, Japan has its own standard that is called JIS, (Japan Industrial Standard) which recognizes 6,353 characters and implements 16 bit codes. Essentially, ASCII with its characteristic can't constitute an international standard.

Unicode Consortium, a nonprofit organization, formed by a number of American computer companies on January, 1991, has taken an initiative to set up an international standard codes based on 16 bit (two byte.) Unicode has already established what it calls "specifications for Unicode Version 1.0"' and quite recently, it has added Version 1.0.1. According to Cen Huang,

"Until recently, the Unicode standard contended to be the internationally accepted character encoding standard with a draft international standard known as ISO 10646. However, a recent agreement between the Unicode Consortium and the committee defining ISO 10646 has established a merger of the two standards: the Unicode standard will become a subset of the ISO 10646 character set.''

*Note: ISO 10646 draft rely on 32 bit (four byte), not on 16 bit (two byte).

Unicode, as it does with some other languages, is working on Ethiopic to include it in its standard. It has produced a technical report on Ethiopic encoding which might raise some eyebrow. This report can be obtained via FTP from unicode.org (I don't remember the directory, but the file name is UTR_1.ascii.)

Given the technical report, it doesn't seem Ethiopians has taken any role in the project. While I applaud the Unicode technical committee for its comprehensive draft proposals, the approach considered by the committee ignores the intricate nature of Ethiopic characters. I would rather talk about this sometimes in the future.

Abass Belay Alamnehe /