Archive for the 'Hebrew' Category

Weird GMail Habit: Removing Control Characters

GMail has a weirdish feature that probably very few people except me know about. When using it with a Hebrew user interface, invisible control characters—LRM, RLM, RLE, LRE and the like—are added to some strings to make them appear correctly in a mixed-direction interface.

Most notably, they are added to email addresses. I sometimes want to copy these email addresses as text, and my mouse pointer picks the control characters as well. Of course, these control characters are by themselves invisible to humans, but very much visible to computers, and an email address with these characters is not correct, even if it appears to be the same to human eyes.

It already became a habit for me to carefully delete and manually restore the first and the last characters of an email address to make sure that the control characters are removed.

It would be better if GMail just used the <bdi> element or CSS bidi isolation. They are fairly well supported in modern browsers and provide better experience.

Guess Which Software the Only Hebrew TLD Runs

There already are several TLDs in the Arabic script for several Arab countries. There are no TLDs in the Hebrew script yet, although one will probably soon be created for Israel.

There is however, a test TLD in Hebrew: “טעסט”. (That’s the word “test” in Hebrew characters and according to Yiddish spelling rules.)

And there’s even an actual working domain in it: http://דוגמה.טעסט. That can be translated as “example.test”. The TLD “טעסט” now appears to the left of “דוגמה”, which is the name, because Hebrew is written right-to-left.

And what happens if you use your browser to go to that domain? It redirects you to http://דוגמה.טעסט/עמוד_ראשי. That string in the end (or in the middle if you will) is the standard Hebrew title of a MediaWiki main page, which you can also see on the Hebrew Wikipedia. The hypothesis that MediaWiki is installed there is proven further by using Google site search on the same domain: http://www.google.com/search?q=site:דוגמה.טעסט. Something in the installation is probably broken, because the pages appear blank, but the page titles can only mean one thing: MediaWiki is, or was, being used to test a Hebrew domain name.

(This post is based on information from Tomer Cohen of Mozilla Israel.)

Facebook, give me my RLM back, please

Facebook doesn’t allow typing LRM and RLM characters in the status field. These are the Unicode characters “Left-to-right marker” and “Right-to-left marker”. People who type in right-to-left languages such as Arabic, Persian, Urdu or Hebrew need these characters to make their status updates appear properly aligned. If i try to type any of these characters, they are deleted when i save the message. There is no reason to do this. Facebook engineers, please allow your users to use these characters. Thank you.

Language teacher

If you search Google for “language teacher” (מורה ללשון) in Hebrew, the autocompletion suggests “language teacher killed herself” (מורה ללשון התאבדה). The word “teacher” is spelled the same for both genders, but the verb is feminine. I don’t know why does it happen, because actually searching for it doesn’t yield anything significant.

In Israeli schools where Hebrew is the medium of teaching, “Language” is the class where the grammar of Hebrew is taught… badly.

Roth

miriamruth11-hp

miriamruth11-hp; copyright: Google; based on the original illustration by Ora Ayal

Today the logo appearing at the top of Google.co.il honors Miriam Roth, the author of the famous Hebrew children’s book “A Tale of Five Balloons”. She was born on the 16th of February in 1910.

The Google employee who uploaded the image, made a mistake: the filename is “miriamruth”, but it should be “miriamroth”. That’s what happens when there’s no proper way to write the vowels: Her last name is written רות, which is how the Biblical name “Ruth”, still common in modern Israel, is written. But the German last name “Roth” is written the same way, because in Hebrew “u” and “o” are usually written using the same letter, Vav.

There is a way to differentiate the sounds: רוּת is “Ruth” and רוֹת is “Roth”. Notice the placement of the dot in relation to the letter in the middle. The sign for “u” is called shuruk, and the sign for “o” is called holam; i wrote the bulk of the articles about them in Wikipedia. Most people don’t type these signs; usually it’s fairly easy to guess the correct pronunciation, but people don’t use these signs even when it’s needed, as is the case with Ruth/Roth, because typing them on the standard Hebrew keyboard is very hard.

For years this made me very angry, so i asked the Standards Institute of Israel to develop a new standard keyboard in which it will be easy to type these signs. I was successful at convincing the SII to do it. The work is now underway, and i actively participate in the monthly meetings, together with representatives from Hamakor – the Israeli association for free and open source software, Israel Internet Association, IBM, Microsoft, Apple, Google and other companies. I hope that the standard will be published in 2011; the technical implementation of the keyboard layout will take about ten minutes on each operating system, and shortly after that, i hope, it will be distributed to computers using some kind of an auto-update mechanism.

And then, i hope, we’ll start to see at least slightly richer Hebrew typography everywhere. I want it to happen, not just because it’s a nice tradition, but because this will simply make Hebrew easier to read – and will prevent silly mistakes, like pronouncing and writing “Ruth” instead of “Roth”.


See also: Maqaf.

Unbearable Lightness

I was invited to the 10th anniversary celebration of the Catalan Wikipedia in Perpignan. Perpignan is a city in France, but from the Catalan point of view, it’s in Northern Catalonia – a rather large territory, also known as Roussillon, that was a part of Catalonia, but passed under French rule in 1659. Catalan is still spoken by many people there; how many exactly – i’ll have to see. I hope that it’s spoken by many people for a purely practical reason – my Catalan is much better than my French.

The Catalan Wikipedia is one of the first two Wikipedias created after the English one. The English Wikipedia was created on the 15th of January 2001; German and Catalan were created on the 16th of March 2001. Catalans love to tell that although their Wikipedia was created a few minutes after the German, it was the first one to have an actual article.

Since the Catalan Wikipedia is the oldest and the largest version of Wikipedia in a language which isn’t official in any big country (sorry, Andorra), the people behind it want to share their experiences promoting their language with other regional and minorized languages and this will be discussed in the event. More details on that later.


Direct El-Al flight from Tel-Aviv to Barcelona – 582 USD. Alitalia via Rome, 2 hours wait for connection – 460 USD. Czech Airlines (ČSA) via Prague, 11 hours wait for connection – 367 USD. Guess which one i picked. ČSA, of course – i pay less and i get to spend a day in Prague! Sorry, El-Al.

If you call Czech Airlines office in Tel-Aviv, you can choose one of the following languages, in that order: English, Russian, German, Czech, French, Spanish, Italian. No Hebrew or Arabic. Except that, however, the service is excellent. I spoke in Russian with the service people and they were very polite, helpful and efficient. They were Czech; They spoke Russian with a slight accent, but it was completely correct and easy to understand. I’ll have to wait for the flight itself to see how it is, but until now my impression is very good.


P.S. Typing the word “Czech” is surprisingly hard.

Componenta

Israeli programmers use many words of English origin when they speak Hebrew. (Many of them prefer to write only in English instead of Hebrew, which is a separate issue.)

When they use these English words, they tend to adapt them to Hebrew pronunciation. Some adaptations are simple, for example “router” is pronounced with an Israeli, rather than English [r] sound (some people – not necessarily purists! – use the Hebrew word נַתָּב [natav] for that). “SQL” is rarely pronounced as “sequel” – usually it’s “ess cue el”, and the same goes for MySQL.

But some are harder to explain. For example, “component” is often pronounced [kompoˈnenta]. I heard it in several companies and i don’t quite understand why. Note the [a] in the end and the stress, too: in English it’s supposed to be something in the area of [kʌmˈpoʊnənt] – on the second syllable, not the third. I have never heard an Israeli programmer pronounce it with correct stress when speaking in English – i always hear it as [ˈkomponənt] – with stress on the first syllable and with a [o]’s in the first two syllables.

The only languages available on Google Translate in which this word is anywhere near [komponénta] are Serbian (компонента), German (Komponente), Romanian (componentă) and Spanish and Italian (componente). It may have something to do with them, but the solution is probably more complicated. Does anyone have any idea?


Archives