Litrit

Exploring Indic language text input systems.

The evolution of human communication has come a long way, from cave paintings to flying pigeons, and now to mobile phones, its been quite a journey. Adding to this journey is how humans have managed to develop over 7000 languages since the origin of speech. As technology percolates to the far reaches of the earth, so is the need for it to adapt to local cultures, making it accessible and equitable.

This project investigates the historical development of modern Indic scripts, their writing systems, mechanization, and adaptation to computing. It also takes a look at how text-based communication in Indic languages has evolved by reviewing previous adaptations and the status quo. This project attempts to document the steps that go into designing an Indic language keyboard. It also proposes an interface for text entry in Indic languages, suitable for modern smartphones.

India’s Next Billion Users

When it comes to topping the charts, India doesn’t rank very well, except for its population, cricket rankings, number of COVID-19 patients, and its number of smartphone users. Gone are the days when owning a smartphone was considered a luxury. With so many manufacturers to choose from, and introductory prices as low as INR 3000, Indians, are spoilt for choice. Be it rich or poor, old or young, almost everyone carries a smartphone in their pocket these days.

In recent years, India’s internet user base has grown immensely, as data plans have become cheaper compared to the prices in 2013. Studies indicate that Indians consume the maximum amount of data per month, compared to the rest of the world. The average data consumed is close to 9.8 gigabytes per month, and a significant amount of this usage comes from smartphones. The price of data is, therefore, no longer a restriction when it comes to embracing digital platforms for data-intensive services such as e-learning modules, streaming movies, and other forms of entertainment.

Typing is one of the most frequent and basic activities carried out on a smartphone. Whether it is sending texts, commenting/posting on social media platforms, blogging, or filling up any form, the role of keyboards often goes unnoticed. Of course, most of us are familiar with the QWERTY keyboard; such is its popularity. As mentioned earlier, the number of smartphone and internet users in our country has risen exponentially. However, most of these new users hail from rural areas where English is not the preferred language. The majority of the content and interfaces people come across online are in English. This language barrier often limits the use of smartphones to just voice calls, and the user tends to miss out on other features.

It goes without saying local language content and interfaces will play a crucial role in the adoption of technology at the grassroots level. Therefore, it is no surprise to see tech giants such as Google and Facebook (to name a few) investing in Indian language localization. The Government of India also launched the “Digital India” campaign, which aims at transforming India into a digitally empowered nation. There have been several attempts in the past to come up with an efficient text entry mechanism for Indic languages. However, due to technological constraints, along with the complex nature of Indic languages, they could not achieve mainstream adoption. With touchscreens, even though tactile feedback is lost, a lot more is possible. Experimentation with new layouts, interactions, animations, adding stickers, using personalized themes, voice typing are some of the features included in virtual keyboards.

Background on Indian Scripts

Before we go on to type in Indic languages, it is essential to understand their nature. The Brahmi script, often considered to be the ancestor of Indian scripts, split into two major branches. One, consisting of south Indian or Dravidian scripts (Kannada, Malayalam, Tamil, and Telugu), and the other, consisting of north Indian scripts (Devanagari, Bengali, Gujarati, and Gurumukhi). They are all written from left to right.

Urdu is one of the 22 official languages of India and is spoken by almost 51 million people. Hindi and Urdu have the same language origins but follow two very different scripts. Hindi follows the Devanagari script, whereas Urdu follows the Nastaliq script derived from Persian and Arabic. Unlike Devanagari, it is written from right to left.

The Devanagari script serves as the base for many Indian languages such as Hindi, Konkani, Maithili, Marathi, Nepali, Sanskrit, and Sindhi. The Bengali script is used to write Bengali and Assamese. Gurumukhi is used to write Punjabi. Some of these languages may differ by having a few additional symbols to represent the purity of particular sounds.

One of the defining aspects of Indic scripts is the similarity of their phonetic nature and follow a logical structure. Each consonant represents a distinctive sound. Based on the articulatory mechanism used to produce these sounds, the consonants are divided into different classes or vargas. This arrangement is also known as the varnamala (see table below).

Origins
Voiceless Plosives Voiced Plosives Nasal Semi-vowels Sibilants Others
Unaspirated Aspirated Unaspirated Aspirated
Velar/Glottal क ka ख kha ग ga घ gha ङ ṅa ह ha
Palatal च ca छ cha ज ja झ jha ञ ña य ya श śa
Retroflex ट ṭa ठ ṭha ड ḍa ढ ḍha ण ṇa र ra ष ṣa
Dental त ta थ tha द da ध dha न na ल la स sa
Labial प pa फ pha ब ba भ bha म ma व va
  • Velar consonants are produced at the velum, which is the soft part of the roof of the mouth. These sounds are produced when the back of the tongue touches the velum. For example, in English, k, g is velar consonants.
  • Glottal consonants are produced deeper in the vocal tract. The sounds originate from the throat.
  • Palatal consonants are produced by raising the front of the tongue and the nearest part of the roof of the mouth. The j in “job” is an example in English.
  • The retroflex consonants are produced by curling the tongue backwards and touching the roof of the mouth. Examples do not exist in English.
  • Dental consonants are produced by the tip of the tongue and the front upper teeth . The th in “thin” is an example.
  • Labial consonants are produced by bringing both lips close together. For example, the b in “baby”.
  • The consonants producing a hissing sound, are called sibilants. They are produced by using the tip of the tongue to push air out, which creates the hissing sound. Examples: श śa, ष ṣa, स sa.

The table below shows the vowels. The vowels have a shorter and longer sound. Except for the first vowel, denoting the “a” sound, each vowel has a corresponding matra. These matras are attached to consonants or conjuncts, imparting their their sound to the consonant or conjunct. A consonant or conjunct can have only one matra attached to it at any given time.

Vowels
a

ā

i

ī

u

ū

e

ai

o

au


lri
Symbols ि

A bit of history

Typing in Indian languages is not a recent phenomenon. In fact, printing technology arrived on Indian shores way back in 1586 thanks to Christian missionaries who wanted to print the Bible in local languages. The first Devanagari typewriter, named the Nagari Lekhan Yantra, was designed by V.M.Atre in Germany and built by Remington. Unlike the Roman script, the Devanagari script having a significantly larger character set, found it difficult to fit into a keyboard layout. Godrej went on to develop the Devanagari typewriter in 1968 in collaboration with Optima, a German company. Since it was an adaptation of the English typewriter, it had to accommodate the Devanagari characters in place of 26 lower- and upper-case roman characters on the keyboard. Some of the features of this layout were as follows:

  • The keyboard tops included the half-letters or pure consonant forms.
  • Among the vowels and their matra symbols, only basic shapes were included on the keytops. Other symbols were generated using a combination of keys.
  • Any spare unallocated keytops were replaced by other frequently used characters.
  • Concepts like the “dead key” (overstrike) and “half backspace” (move backward by half a character width) were introduced. This made it possible for the placement of some lower and upper matra symbols.
  • The keys were arranged based on a visual order and not a logical order.
The Godrej Typewriter
A closer look at the keyboard. 2 characters are shared per key

Looking at the above images, one can imagine how complicated it must’ve been to type on such a keyboard. The situation did not see much improvement even with the introduction of computers. It was quite clear that trying to accommodate the large number of characters onto a keyboard layout suitable for English was not a very bright idea. Efficient typing usually means a high rate of text entry, with minimum keystrokes, low cognitive effort, and minimal hand movement. If the learning curve is too steep, users may find it uncomfortable or undesirable to use. Listed below are some of the challenges faced when designing a keyboard for Indic languages.

Characteristics Roman Script Indic Scripts
Character set Small character set of 26 alphabets including 5 vowels. Much larger character set consisting of 33 consonants and 14 vowels.
Writing Sequence Linear fashion of writing and reading. Diacritic marks are not written or read in a linear sequence. They are placed either on the top/bottom/ sides of a base consonant.
Learning Curve Very easy, given the familiarity with the keyboard layout. Can prove to be difficult, as users are not familiar with the layout.
Cognitive Effort The keyboard layout for smaller screens is similar to its hardware counterpart. Accommodating so many keys on a smaller screen may increase cognitive load in users.

Differences when it comes to Roman and Devanagari keyboards

Keyboards are usually designed keeping in mind two principles- they are either designed based on the frequency of used characters or by the logical structure of their scripts (i.e. the alphabetical order). The InScript keyboard is an example of a partially logical and partially frequency-based layout. In contrast, the Keylekh keyboard serves as an example of a layout based on the logical structure of Indic scripts.

The InScript Layout- where the Devanagari characters are fighting for space with English alphabets.
Final design of Keylekh. Vowels on the left and consonants on the right.

Virtual Indic Keyboards

The first thing to keep in mind about using virtual Indic language keyboards on computers is that they are not very easy to find. One has to navigate to the language settings and install their preferred languages. In most cases, an on-screen keyboard appears for the user’s preferred language. Users also have the option of using the standard physical keyboard as a transliteration keyboard, where the words typed in English is converted into the words in the preferred language. In the case of on-screen keyboards, text entry takes place through clicking on individual characters one at a time. Some examples are shown below.

The InScript layout on Windows 10. Shift buttons allows access to the alternate characters mapped to a key.
Nowadays, there are plenty of keyboards available online which allow users to input text in their desired language.

The challenge of fitting the vast number of characters in Indic scripts reaches a whole new level when designing for small screens such as smartphones. In most cases, a large portion of the screen gets covered by the keyboard. Finding the most efficient keyboard layout is difficult, as there are plenty of keyboards available on the market these days. Testing each of them one by one is extremely time-consuming. Trust me, I downloaded loads of keyboards and played around with them.

Google’s Gboard and Microsoft’s SwiftKey offer feature-rich, customizable keyboards in various Indian languages. Transliteration, handwriting recognition, word predictions are some of the main features. However, when it comes to typing with these keyboards, it can take quite some time to get used to their layouts. The number of alternate screens for each language adds to the user’s confusion. To see how users react to using these keyboards, click here.

Another example is the Swarachakra keyboard, developed by Industrial Design Center, IIT Bombay. The initial layout displays only the consonants, which are phonetically grouped and arranged in a grid. This arrangement makes it easier for the user to locate the alphabets. When a user clicks/touches a consonant, a popup wheel (chakra) appears, which displays all the possible combinations of consonants and vowels. The resulting popup wheel often ends up blanketing the surrounding characters. A user feels compelled to download several Swarachakra apps if they wish to type in different Indian languages, mainly because each of them has a separate application.

Gboard- the Hindi keyboard has 4 alternate (highlighted in yellow) or “shift” screens.
SwiftKey- the alternate screens (again highlighted in yellow) are tough to notice at first glance.
Swarachakra with its popup wheel blanketing surrounding characters and almost going off-screen in the process.

Developing the Keyboard Interface

In this section, we finally take a look at my proposed keyboard. Inspired by the work done by fellow researchers on the same topic, I set about making my version of an Indic language keyboard. Gestalt’s laws of grouping describe how humans tend to form patterns or simplify complex images when perceiving objects. Objects or shapes, when placed close together, tend to form groups in our brains (also known as Gestalt’s law of proximity). We apply the same principle when grouping the consonants and vowels in the keyboard. When objects are placed together in a closed region, our minds consider them to be a part of the same group. The arrangement of the alphabets based on these laws help the users in identifying the layout better.

Designing the keyboard was challenging. Countless hours went by trying to understand the phonetic nature of the characters. As it turns out, there are a different number of characters/alphabets for every language. For example, Bengali has 50 characters, compared to Devanagari, which has 55 characters. There are some rarely used characters as well, which added to the dilemma of including them or not. Getting the layout right was crucial. This included fitting all the characters, their placement, shift/alternate screens, popups, and the overall height of the keyboard.

The overall process behind making this keyboard is summarized below.

My thought process behind making the keyboard is summarized below:

  • Aesthetics and Symmetry
    In the case of Indic language keyboards, with so many characters to select, layout style can make all the difference when it comes to familiarizing the user with the interface. Adequate spacing between the characters, and equally distributing them across the layout were some of the things I kept in mind. The vowels are color-coded to so they are easily identifiable. A separator between the vowels and consonants, gives the keyboard a much needed symmetry.
  • Logical Structuring
    The keyboard layout follows the logical order of Indic scripts. The arrangement of the keys is in the alphabetical order. In doing so, the cognitive load is reduced by relying on the existing pedagogical systems known to children throughout India. This familiarity makes it easier to locate the characters, instead of having to ‘hunt’ them as seen on other keyboards.
  • Frequency of Use
    By now, we have already established the fact that fitting all the characters on a single layout is impossible. Like other keyboards, the smart thing to do is to display the most frequently used characters upfront, while the rest of them appear on an alternate screen, or in popups.
  • Categorical Integrity
    All characters belonging to a particular category, be it vowels, consonants, symbols, numbers, are placed together. Again, this is to make it easier to locate similar characters instead of hunting them across the keyboard.
  • Functionality and Features
    Adding new features is a work in progress. Currently, users can switch between two languages only. In the future, I would like to include other Indian languages as well. Additional features such as personalized themes, emoji’s and voice typing also need to be added.
Initial layout, shift screen and popup characters
Type in multiple languages
Dynamically changing vowels
Express yourself better with emoji

To install the keyboard application on your android phone, click here. You can also see how it works on the phone using the same link.

Conclusion

As we head towards a digital ecosystem, it is crucial to keep in mind the diverse nature of our society. There were certain limitations imposed upon us by earlier technologies, which contributed towards creating a digital divide. The ubiquity of our phones has come a long way in bridging this gap. Moreover, by adopting local languages in digital services, it would ensure its compatibility and ease of use among the people. With that in mind, this project explores a variety of topics, covering the nature of Indic scripts; to the previous attempts made at creating an interface for text entry in Indic languages.

Currently, we are witnessing the integration of voice-enabled features within existing interfaces. With improvements in speech recognition technology and machine learning algorithms, it would be interesting to see how they adapt to the complex nature of Indian languages.

To dive deeper into this project and read more about it, click here.