Guidelines of a Logical User Interface for Editing Bidirectional Text

Author	Matitiahu Allouche ([email protected])
Date	2002-03-20
This Version	2.4
Previous Version	2.3

Change History

Version 2.4

Add guidelines for management of the cursor in section "Cursor"
Add explanations in section "Left and Right Arrows"
Add explanation in section "Entry of Text"
Add clarification and note in section "Backspace and Delete"
Add clarification and new example in section "Effect of keyboard language change on the cursor level"
Various editorial improvements

These changes are colored in red.

Version 2.3

Add clarifications in section "Next/Previous Word"
Add clarifications in section "Backspace and Delete"
Add clarifications in section "Alternative Guidelines for a Simplified Implementation"

Version 2.2

Remove section "Replace mode on Boundary"

Version 2.1

Enhance definition of "Paragraph Embedding Level"
Add definition for "Visual Position"
Add clarifications in section "Assumptions"
Change reference of LTR keyboard language from "Latin" to "English"
Add note in section "Replace Mode on Boundary"
Add notes in section "Selection"

Version 2.0

Change the document title
Add "Definitions" section
Change text on keyboard language in the "Assumptions" section
Change text on keyboard language in "Set cursor level and keyboard language according to new cursor position"
Remove section "Bidi Levels" (content moved to "Definitions")
Add section "Position the Cursor with a pointing device"
Add examples to "Entry of Text" and "Home and End" sections
Move simplified implementation guidelines to a separate appendix
Many editorial changes

Version 1.9a

Add explanations in "Programming Model"
The note at the end of subsection "Visual Approach" of section "Left and Right Arrows" was deleted (invalidated by the change in "Effect of keyboard language change on the cursor level")
Add examples to "Backspace and Delete"
Change of whole content of "Effect of keyboard language change on the cursor level"

Version 1.8

Add specification of logical cursor movement in "Left and Right Arrows"
Add detail for Visual and Logical selection
Correct error in "Conversion of cursor logical position to visual position" (case c9)
Add simplified specification in "Conversion of cursor logical position to visual position"

Version 1.7

Add detail on Assumptions
Add detail on Entry of Text

Version 1.6

Add detail on behavior of Backspace and Delete on boundary
Add detail about setting cursor level according to new cursor position
Add explanation about how changing keyboard language affects cursor level

Version 1.5

Change document format to HTML
Add section "Guidelines for Programming Editors"

1 Introduction

This document intends to trace the guidelines of a User Interface (in short: UI) for editing bidirectional (in short: Bidi) text. It is assumed that the user enters text in logical sequence, and that the Unicode Bidi Algorithm (in short: UBA) is used to reorder the text for presentation. We assume that the readers of this document have a working knowledge of the UBA. The UBA is described in Unicode Technical Report 9 (see http://www.unicode.org/unicode/reports/tr9).

When designing these guidelines, the following objectives were set, in order of decreasing priority:

Prevent actions unexpected by the user, particularly when the action is destructive (erases one or more characters).
Make the interface efficient.
Keep the interface easy to implement.

2 Abbreviations

Bidi	bidirectional
LTR	left-to-right
RTL	right-to-left
UBA	Unicode Bidi Algorithm
UI	User Interface

3 Definitions

Bidi Embedding Levels	The UBA assigns a level to each character in the logical buffer, including neutrals, which determines if it is part of LTR or RTL text, and eventually affects the presentation. Level 0 corresponds to base LTR text. Level 1 corresponds to base RTL text, or to RTL text embedded within level 0 LTR text. Level 2 corresponds to LTR text embedded within level 1 RTL text, itself possibly embedded within level 0 LTR text. And so on for higher levels. Even levels always correspond to LTR text, odd levels always correspond to RTL text.

Current Position	Position in the logical buffer where actions like text entry or Delete are going to take effect.

Cursor Direction	Expected direction of cursor progression when entering text. It is LTR when the cursor level is even, RTL when the cursor level is odd. Note that the actual progression can be different (for instance, when entering a digit within Hebrew text).

Cursor Level	For the needs of the UI, a Bidi level is assigned also to the cursor. This level reflects the Bidi level which is expected to be assigned to the next character entered (there are cases when the actual level of the entered character will be different). The level of the cursor is manipulated by UI functions, like changing the keyboard language. It may also be affected by all functions which change the position of the cursor.

Keyboard Language	Language of the next character that will be entered from the keyboard. The keyboard language can be set manually by the user. It can also be changed by almost all functions except text entry.

Logical Buffer	Buffer containing the text data in logical sequence (as opposed to visual sequence).

Paragraph Embedding Level	Bidi level of text belonging to the main language used in a paragraph. This is 0 if the main language is LTR, 1 if the main language is RTL. Note: this is also known as "Base Direction".

Text Cursor	Graphic representation of where actions like text entry or Delete are going to take effect. The text cursor is often displayed as a vertical bar preceding the position where a newly entered character is expected to appear.

Visual Position	Location on a display device where the text cursor is represented. The visual position can be manipulated directly by functions like arrow keys, or using a pointing device.

4 Assumptions

Languages can be associated with a direction, either LTR (for languages like English, Greek, Russian) or RTL (for languages like Arabic and Hebrew).
The keyboard must be able to generate more than one language, at least one LTR language and one RTL language. The means to change the keyboard language (key combinations, or GUI gadgets) are not addressed by this document.
A default language is defined for each direction. Since even Bidi levels correspond to LTR direction and odd Bidi levels correspond to RTL direction, each level has thus a default language which is the default language associated with the direction corresponding to that level (for example: English for even levels, Arabic for odd levels). The default language for each direction may be specified in some user profile.
When the cursor is moved to a new position, the keyboard language must be set automatically to the language of the context. In case of ambiguity (for instance at the boundary between two text segments using different languages), the keyboard language must be set to the language of the segment preceding the new position. However, if the new position was achieved using contiguous cursor positioning (using, for instance, the left or right arrow keys), the keyboard language should be set to the language of the last character jumped above.
The UI must provide means to set the paragraph embedding level (paragraph orientation), unless only one paragraph orientation is expected to be used. How this is done is not addressed by this document.
The UI may provide means to specify explicit embedding levels. These means are not addressed by this document. In the absence of such means, the Bidi levels are limited to 0, 1, 2 for a paragraph embedding level of 0, to 1 and 2 for a paragraph embedding level of 1.

5 Programming Model

This document assumes a programming model for text editing, where the text data is stored in logical sequence in a buffer, but is presented to the user in visual sequence.

The current point of insertion in the text is called the current position in the logical buffer and is indicated by a text cursor on the display.

The special case when the point of insertion is before the first logical character must be handled as if there was a dummy character with a Bidi level equal to the paragraph embedding level preceding the data in the logical buffer.

The special case when the point of insertion is after the last logical character must be handled as if there was a dummy character with a Bidi level equal to the paragraph embedding level following the data in the logical buffer.

Note: Bidi levels are explained in a following section.

6 Cursor

The text cursor is sometimes represented graphically as a vertical bar before the position where a newly typed character would appear. Other implementations represent the text cursor as an underscore located under the current character (the character at the current position). In this document, we assume a vertical cursor displayed before the current character. However, "before" must be interpreted relatively to the orientation of the current character: if the current character belongs to an LTR run, "before" means "to the left"; if the current character belongs to an RTL run, "before" means "to the right".

Guidelines for management of the cursor

The cursor has two roles:

visual feedback for the location of the last function (for instance: after data entry, after Delete, after Backspace)
visual indication where the next function will take place.

Note: at a boundary between runs of different levels, the two roles may conflict. The one which prevails depends on the last function executed.

7 Logical and Visual functions

Some UI functions are easier to define in relation to the logical buffer and the current position, we call them "logical" functions. Other UI functions are easier to define in relation to the display and the text cursor, we call them "visual" functions.

Most logical functions may be handled for Bidi text exactly as non-Bidi text is handled by a Bidi-unaware text editing application.

The following are considered visual functions:

Non contiguous cursor positioning (using, for instance, a pointing device such as a mouse)
Contiguous cursor positioning (using, for instance, keyboard keys such as right, left, up, down arrow keys)
PageUp, PageDown
Tab, BackTab (when used within a text editing unit)

The following are considered logical functions:

Entry of text (both in Insert and in Replace modes)
Delete 1 character forward
Delete 1 character backward (Backspace)
Home
End
Newline

8 Position the Cursor with a pointing device

The new visual position must be translated to a current position in the logical buffer, and new values must be computed for the cursor level and the keyboard language.

9 Set cursor level and keyboard language according to new cursor position

When the cursor position is changed by a visual function, the cursor level and keyboard language may be changed as explained below.

If the new cursor position is viewed by the software as related to one character, either on or before or after that character, the cursor level must be set to the level of this character, and the keyboard language set to the language of that character (or of its context if the character itself can be used in more than one language).

If the new cursor position is viewed by the software as located between two characters without being related to one character more than to the other one,

if the new cursor position is between characters with the same Bidi level, the cursor level must be set to that level, and the keyboard language set to the language of the character preceding the cursor (or of its context if the character itself can be used in more than one language).
if the new cursor position is between characters with different Bidi levels, the cursor level must be set to the lower of the two, and the keyboard language set to the language of that character (or of its context if the character itself can be used in more than one language).

10 Left and Right Arrows

It was said above that left and right arrows are considered visual functions. This is the recommended approach. However, a "logical" approach is sometimes implemented. Specification for such an approach will be presented in an appendix.

Visual Approach

The left arrow should always move the cursor leftwards on the screen, the right arrow should always move the cursor rightwards. This is equivalent to backward and forward (respectively) in the logical buffer only if the Bidi level is not odd (odd levels correspond to Arabic or Hebrew text).

Moving over a character must leave the cursor close to that character, on the right side after a Right Arrow, on the left side after a Left Arrow.

In technical terms, moving over a character using the left or right arrow key must set the cursor Bidi level to the level of that character and the keyboard language to the language of that character (or of its context if the character itself can be used in more than one language).

The current position in the logical buffer must be set adjacent to that character: before the character for left arrow and even level or right arrow and odd level; after the character for left arrow and odd level or right arrow and even level.

Example:         latin|WERBEH

After clicking between "n" and "W", the keyboard language will be set to the default LTR language (e.g. English). If the user wants to add to the Hebrew text, all he/she has to do is to press the right arrow (which will change the cursor level and keyboard language to match the Hebrew "W" and display the cursor between the "W" and the "E" ) then press the left arrow to get back between the "n" and the "W" but in "Hebrew mode".

Left and Right Arrows at line boundary

This section addresses multiline text controls.

When the paragraph embedding level is even (LTR paragraph), and the cursor is located on the right side of the rightmost character of the line, pressing Right Arrow must position the cursor on the left side of the leftmost character of the following line (even if the paragraph embedding level of the following line is odd (RTL paragraph).

When the paragraph embedding level is even (LTR paragraph), and the cursor is located on the left side of the leftmost character of the line, pressing Left Arrow must position the cursor on the right side of the rightmost character of the previous line (even if the paragraph embedding level of the previous line is odd (RTL paragraph).

When the paragraph embedding level is odd (RTL paragraph), and the cursor is located on the left side of the leftmost character of the line, pressing Left Arrow must position the cursor on the right side of the rightmost character of the following line (even if the paragraph embedding level of the following line is even (LTR paragraph).

When the paragraph embedding level is odd (RTL paragraph), and the cursor is located on the right side of the rightmost character of the line, pressing Right Arrow must position the cursor on the left side of the leftmost character of the previous line (even if the paragraph embedding level of the previous line is even (LTR paragraph).

Note: what should happen when the cursor should go to the following line but is already on the last line, or should go to the previous line but is already on the first line, is not specified by this document, because it is not a specific Bidi issue. One possible way to handle such a situation is to leave the cursor where it is.

11 Next/Previous Word

When associated with arrow keys (e.g. Alt or Ctrl + Right or Left arrow), the "Next Word" and "Previous Word" functions must be interpreted as "Next Word on the Right" or "Next Word on the Left". This is the recommended approach. However, a "logical" approach is sometimes implemented. Specification for such an approach will be presented in an appendix.

The cursor must be positioned on the logical beginning of the word, which will be the leftmost character for an LTR word and the rightmost character for an RTL word.

The cursor level must be set to the level of the first logical character of the word.

The keyboard language should automatically change, if needed, to match the language of the word at the new position of the cursor.

Example:         i work MBI ROF and i like my job.

Assuming the cursor is at the begining of the line, moving with Ctrl + right arrow, the cursor will jump on the following characters: w I F a i l m j.

12 Up and Down Arrows

Most implementations of Up and Down arrows move the current position up or down in the text by 1 line, while trying to preserve the horizontal coordinate of the cursor. This is equivalent to setting a new visual position, which must be translated to a logical current position in the buffer, and new values must be computed for the cursor level and the keyboard language.

13 PageUp, PageDown

PageUp and PageDown move the current position up or down in the text by a number of lines. As to setting a new horizontal coordinate for the cursor, there are implementations where its position is moved to the beginning of the new line. In that case, PageDown and PageUp are logical functions similar to Home. If PageUp and PageDown maintain the visual x coordinate of the cursor, they are equivalent to pointing to that position (e.g. with the mouse). A new visual position is set, which must be translated to a logical current position in the buffer, and new values must be computed for the cursor level and the keyboard language.

14 Tab, BackTab

The following applies when Tab and Backtab operate within a text control. The effect of Tab and BackTab may vary across different implementations. We will address a number of possible behaviors, and how they translate to a RTL context.

When Tab and BackTab only affect the cursor position (but do not cause addition of characters to the logical buffer), Tab must move the cursor in the direction corresponding to the paragraph embedding level (LTR if even, RTL if odd), and BackTab in the opposite direction. The cursor movement must be computed based on the visual position of the cursor. For example, if in an LTR context Tab would move the cursor from position 1 to 9 to 17 and so on, in a paragraph with an odd embedding level, Tab would move the cursor from visual position 1 to 9 to 17 and so on starting on the right side and proceeding to the left. This is a visual function. Like after other visual functions, after moving the cursor with Tab or Backtab, the cursor level and the keyboard language must be computed for the new position.

When Tab causes insertion of characters in the logical buffer (Tab character or a computed number of spaces), it must behave like entry of text (see section 14). This is a logical function which operates on the logical buffer. The number of spaces to insert must be based on the current logical position in the logical buffer.

Note: when Tab and/or Backtab are used to move from one control to another, this movement must be Right to Left and Top to Bottom if the whole screen (window, dialog) is localized for a RTL language.

15 Entry of Text

Entry of text is a logical function, which acts on the logical buffer exactly as for non-Bidi data. After any modification of the buffer, the visual order for presentation must be refreshed.

After typing a character, the cursor must appear adjacent to and logically after the newly entered character. In technical terms:

the cursor level will be set equal to the level of the just entered character;
the keyboard language will not change.

EXAMPLE 1 (typing English)

Buffer:         latin|
Levels:         000000
Display:        latin|

EXAMPLE 2 (typing Hebrew)

Buffer:         latinHEB|
Levels:         000001111
Display:        latin|BEH

EXAMPLE 3 (typing number within Hebrew)

Buffer:         latinHEBREW123|
Levels:         000001111112222
Display:        latin123|WERBEH

16 Backspace and Delete

Backspace and Delete are logical functions. They must delete the character logically before and after (respectively) the text cursor. In LTR text, this is left and right of the cursor, in RTL text this is right and left of the cursor. After a character is removed, the cursor must stand in place of the removed character.

Delete and Backspace should not remove a character which is far from the displayed cursor position. This could happen when the cursor is logically between characters with different Bidi levels. The algorithm will be:

If the cursor level is equal to the level of the character to be removed, perform the operation.
Otherwise, if the characters before and after the cursor have the same level, and this level has the same parity as the cursor level, perform the operation.
EXAMPLE (this could happen after removing a level-2 segment between "latin" and "continue")
```
Buffer:         latin|continue
Levels:         00000200000000
Display:        latin|continue
```
Otherwise, if the character to be removed has a level of the same parity as the cursor level, and this character is the character that would be jumped over by pressing Right Arrow (in the case of even parity and Delete, or odd parity and Backspace) or Left Arrow (in the case of odd parity and Delete, or even parity and Backspace), then perform the operation.
EXAMPLE with Delete (this could happen when upper level segments are split between 2 lines because the display width is narrow; in this example, pressing Delete will remove the "c" of continue, and the cursor will receive level 0)
```
Buffer:             latinHEBREW1234|continue
Levels:             000001111112222200000000
Display (line 1):   latinWERBEH
        (line 2):   1234|continue
```
EXAMPLE with Backspace (this could happen when upper level segments are split between 2 lines because the display width is narrow; in this example, pressing Backspace will remove the "4", and the cursor will receive level 2)
```
Buffer:             latinHEBREW1234|continue
Levels:             000001111112222000000000
Display (line 1):   latinWERBEH
        (line 2):   1234|continue
```
Otherwise, do not remove the character but make the cursor level equal to the level of the character to remove, and the keyboard language equal to the language of that character (or of its context if the character itself can be used in more than one language).
EXAMPLE with Delete
```
Buffer:         latinHEBREW|continue
Levels:         00000111111100000000
Display:        latin|WEBREHcontinue
```
Press Delete, cursor level is made equal to the level of "c", so that it is now 0.
```
New display:    latinWEBREH|continue
```
Press Delete again
```
New display:    latinWEBREH|ontinue
```
EXAMPLE with Backspace
```
Buffer:         latinHEBREW1234|continue
Levels:         000001111112222000000000
Display:        latin1234WEBREH|continue
```
Press Backspace, cursor level is made equal to the level of "4", so that it is now 2.
```
New display:    latin1234|WEBREHcontinue
```
Press Backspace again
```
New display:    latin123|WERBEHcontinue
```

Note: it is recommended to provide a user option, by which users can choose if they prefer to have Backspace and Delete move the cursor without removing a character when the cursor is visually far from the cursor position, or to have Backspace and Delete remove a character anyway.

After the successful removal of a character by Delete or Backspace, the cursor level must be set to that of the removed character, and the keyboard language must be set to the language of that character (or of its context if the character itself can be used in more than one language).

17 Home and End

Home and End keys are logical functions which move the current logical position before the first logical character (whatever its Bidi level) and after the last logical character (of the line, sentence, paragraph etc... according to whatever unit of text the non-Bidi Home and End functions relate to).

Beside moving the cursor position, Home and End must reset the cursor level to the paragraph embedding level, and the keyboard language according to that level (e.g. English for an even level, Arabic or Hebrew for an odd level).

EXAMPLE 1 (after entering Home)

Buffer:         |latin
Levels:         000000
Display:        |latin

EXAMPLE 2 (after entering Home)

Buffer:         |HEBREWlatin
Levels:         011111100000
Display:        |WERBEHlatin

EXAMPLE 3 (after entering Home)

Buffer:         |HEBREW
Levels:         1111111
Display:        WERBEH|

EXAMPLE 4 (after entering Home)

Buffer:         |123HEBREW
Levels:         1222111111
Display:        WERBEH123|

EXAMPLE 5 (after entering End)

Buffer:         latin|
Levels:         000000
Display:        latin|

EXAMPLE 6 (after entering End)

Buffer:         latinHEBREW|
Levels:         000001111110
Display:        latinWERBEH|

EXAMPLE 7 (after entering End)

Buffer:         HEBREW|
Levels:         1111111
Display:        |WERBEH

EXAMPLE 8 (after entering End)

Buffer:         HEBREW123|
Levels:         1111112221
Display:        |123WERBEH

18 Newline

Newline's effect is like moving to the next line and performing a Home function.

19 Effect of keyboard language change on the cursor level

When the current position in the logical buffer is between characters with opposite orientations, changing the keyboard language must change the cursor level to that of the adjacent character with the orientation corresponding to the new language.

Example 1:  logical buffer:   latin|HEBREW
            levels:           000000111111
            display:          latin|WERBEH

Changing the keyboard language to Hebrew will affect the cursor:

New state:  logical buffer:   latin|HEBREW
            levels:           000001111111
            display:          latinWERBEH|

From that state, changing the keyboard language to English would return to the initial state.

Example 2:  logical buffer:   latinHEBREW|continue
            levels:           00000111111100000000
            display:          latin|WERBEHcontinue

Changing the keyboard language to English will affect the cursor:

New state:  logical buffer:   latinHEBREW|continue
            levels:           00000111111000000000
            display:          latinWERBEH|continue

From that state, changing the keyboard language to English would return to the initial state.

When the current position in the logical buffer is between two characters of the same orientation, changing the keyboard language must toggle the cursor level between the level of the character with the lowest level, if the new language corresponds to the orientation of that character. If the new language corresponds to the opposite orientation, let us call lowest the level of the character with the lowest level around the current position. If lowest is greater than the current embedding level, the new level will be lowest minus 1; otherwise, the new level will be lowest plus 1.

Example 3:  logical buffer:   latin|text
            levels:           0000000000
            display:          latin|text

Changing the keyboard language to Hebrew will set the cursor level to 1. Changing it back to English will reset the cursor level to 0.

Example 4:  logical buffer:   latin|textHEBREW
            levels:           0000002222111111
            display:          latin|WERBEHtext

Changing the keyboard language to Hebrew will set the cursor level to 1.

New state:  logical buffer:   latin|textHEBREW
            levels:           0000012222111111
            display:          latinWERBEHtext|

From that state, changing the keyboard language to English would return to the initial state.

Example 5:  logical buffer:   latin|textHEBREW
            levels:           0000022222111111
            display:          latinWERBEH|text

Changing the keyboard language to Hebrew will set the cursor level to 1.

New state:  logical buffer:   latin|textHEBREW
            levels:           0000012222111111
            display:          latinWERBEHtext|

From that state, changing the keyboard language to English would set the cursor to level 0, as below:

New state:  logical buffer:   latin|textHEBREW
            levels:           0000002222111111
            display:          latin|WERBEHtext

Example 6:  logical buffer:   HEBREW|MORE
            levels:           11111111111
            display:          EROM|WERBEH

Changing the keyboard language to English will set the cursor level to 0 if the current embedding level is 0, and to 2 if the current embedding level is 1.

20 Visualisation of a boundary situation

A boundary situation is said to exist when the cursor level is not equal to the level of the character at the current position. Its visual manifestation is that the text cursor appears in a position remote from that of the next existing logical character.

Example:    logical buffer:   latin|HEBREW
            levels:           000000111111
            display:          latin|WERBEH

The cursor seems to precede "W", but in fact it logically precedes "H". Because this is surprising, it is required to give the user a visual hint that a boundary situation exists. This may be done by changing the appearance of the cursor, for instance:

changing the thickness of the cursor
changing the length of the cursor
changing the color of the cursor
changing from blinking to not-blinking or vice-versa, or changing the blinking frequency
changing the position of the cursor relative to the base line
displaying a second cursor adjacent to the character at the current position (on the right of H in our example).

21 Visualisation of keyboard language

The value of the keyboard language is critical for text entry, thus it is required to keep the user aware of it. A typical Bidi user rarely uses more than 2 languages in the same paragraph, one for LTR text and one for RTL text, so it is enough to have an indication of the direction implied by the current keyboard language. This may be done by changing the appearance of the cursor, for instance:

changing the shape of the cursor to add a little arrow pointing to the current direction (see implementation in Windows)
changes mentioned in the previous section about boundary conditions.

22 Selection

The problem with selection in a Bidi environment is when it starts in LTR text and ends in RTL text, or vice-versa. Let us take as example the string "latinIDIB". Let us assume that the start point of selection is between "a" and "t" and the end point is between "I" and "B". There are 3 possible approaches:

Visual approach: the highlighted text will be "tinIDI". The corresponding selected text in the text buffer is NOT contiguous.
Logical approach: the selected text will be from "t" to "B". The highlighting will affect 2 parts: "tin" and "B" (not contiguous highlighting).
Despotic approach: whenever a selection would create non-contiguous highlighting or non-contiguous selected text, the selection is extended automatically to the minimum that will still have everything contiguous. In that case, that would be "tinIDIB".

The despotic approach lacks flexibility and is irritating for users. The visual approach is more intuitive and is predictable (the selected portion does not jump wildly like it does for the logical approach). However, the logical approach is the one which makes sense when the selection is initiated by a program, like for highlighting a search expression found in text.

The Bidi requirement mandates logical selection when the selection is initiated by a program, and for manual selection (using arrow keys or mouse) if it is the user's preference. Implementation of visual selection is recommended to satisfy users who prefer this type of manual selection.

Logical Selection

Logical Selection is defined as the text within the logical buffer between the logical position of the start point and the logical position of the end point.

Visual Selection

Visual Selection is defined as the text displayed between the start and the end points. When an operation (e.g. copy, move, delete, set attribute) is done on the selection, the UI software must identify what parts of the logical buffer are displayed between these points, because they are the ones which are affected by this operation.

Note 1: the result of an operation performed on visual selection (e.g. copy and paste) may appear quite different from the selected text in its original location, depending on the context of the original location and the context of the destination location.

Note 2: it is recommended that the UI give some sort of warning if a visual selection corresponds to non consecutive text in the logical buffer.

When a piece of text has been selected, it may be contained within one line, or span more than one line.

Single Line Visual Selection

Only the text displayed between the start point and the end point belongs to the selection.

Multi-Line Visual Selection

The selection is delimited by two points in the text. In the case of a multi-line selection, these points belong to different lines. Let us call "start point" the point belonging to the line appearing first, and "end point" the point belonging to the line appearing last. The whole selection can be decomposed into:

text on the same line as the start point (start line)
text on the same line as the end point (end line)
text on intermediate lines (there may be zero, one or more such lines).

Start Line

The text belonging to the selection is whatever appears from the start point until the end of the line. The end of the line is its rightmost character if the current paragraph embedding level is even (LTR paragraph), or its leftmost character if the current paragraph embedding level is odd (RTL paragraph).

End Line

The text belonging to the selection is whatever appears from the beginning of the line until the end point. The beginning of the line is its leftmost character if the current paragraph embedding level is even (LTR paragraph), or its rightmost character if the current paragraph embedding level is odd (RTL paragraph).

Intermediate Lines

The text belonging to the selection is the whole of each line.

23 Arabic Ligatures

We address here only the basic Arabic ligatures which are composed of the letter Lam followed logically by the letter Alef (or variants of Alef like Alef with Hamza above or Alef with Madda above).

When a Lam is followed by an Alef, the two characters must be processed logically like two characters, although they are visualized with one glyph.

If the current position is before the Lam and the user presses Delete, the Lam will be deleted and the Lamalef glyph will be replaced by an Alef glyph.
If the current position is after the Alef and the user presses Backspace, the Alef will be deleted and the Lamalef glyph will be replaced by a Lam glyph.

Visually, the Lamalef glyph must be processed like two glyphs with no space between. The cursor may appear before the Lamalef, after the Lamalef, or in the middle of the Lamalef. When the cursor appears in the middle of the Lamalef, the current position in the logical buffer is between Lam and Alef.

If the cursor is before the Lamalef and the user presses left arrow, the cursor will appear in the middle of the Lamalef. If the user presses left arrow again, the cursor will appear after the Lamalef.
If the cursor is after the Lamalef and the user presses right arrow, the cursor will appear in the middle of the Lamalef. If the user presses right arrow again, the cursor will appear before the Lamalef.
It is possible to position the cursor in the middle of a Lamalef using the mouse, or up/down arrows, or PageUp/PageDown.
It is possible to select only the right part or only the left part of the Lamalef glyph, either using arrow keys or the mouse. The right part corresponds to selection of the Lam; the left part corresponds to selection of the Alef.

Note: a good implementation of these requirements for Arabic ligatures can be seen in Notepad under Arabic Windows NT4, or Windows 2000. It is good to know that when the keyboard language is Arabic, the "G" key generates a Lam, the "H" key generates an Alef, and the "B" key generates Lam followed by Alef.

24 Conversion of cursor logical position to visual position

Knowing that the text cursor is at a certain point in the text buffer, where should it be displayed on the screen? The guiding principle is that the cursor should hint where the next text operation (typing, delete, backspace) is going to take place. We assume a vertical cursor which is displayed between characters. The special case of the Home and End positions can be solved by handling it as if there was a dummy character with a level equal to the paragraph embedding level before the line and another after the line (line or other unit of text to which Home and End relate).

This section uses the concept of Bidi level of the cursor. There is no requirement to implement this concept, as long as the results (placement of the text cursor) are identical to what is specified here.

Here is a reminder about the Bidi level of the cursor. It can be set as follows:
1. After typing, this is the Bidi level of the last character typed.
2. After moving over a character with left/right arrow, this is the Bidi level of the last moved over character.
3. After Delete or Backspace, the cursor level is set to that of the removed character.
4. After Home, End and Newline, the cursor level is set to the paragraph embedding level.
5. Setting the keyboard language to English mode sets the cursor to an even level. Setting the keyboard to Arabic or Hebrew sets the cursor to and odd level.
6. After up/down arrow, PageUp/Down, mouse click, the cursor level is recalculated to the lower level of the 2 surrounding characters. An even cursor level implies English language for the keyboard. An odd level implies Arabic/Hebrew language for the keyboard.
If the characters before and after the cursor have the same Bidi level, there is no problem: the 2 characters are displayed adjacently on the screen, and the cursor will be displayed between the 2.
If the characters before and after the cursor have different levels, the cursor may have a level equal to one of those characters, or different from both.
1. If the cursor level is equal to the level of the previous character, the cursor must be displayed after this character ("after" is on the right for even levels, on the left for odd levels).
```
Example 1:  logical buffer:   latin|HEBREW
            levels:           000000111111
            display:          latin|WERBEH

Example 2:  logical buffer:   HEBREW|latin
            levels:           111111100000
            display:          |WERBEHlatin

Example 3:  logical buffer:   latin|1234HEBREW
            levels:           0000002222111111
            display:          latin|WERBEH1234

Example 4:  logical buffer:   HEBREW|latin
            levels:           111111122222
            display:          latin|WERBEH

Example 5:  logical buffer:   latinHEBREW1234|MORE
            levels:           00000111111222221111
            display:          latinEROM1234|WERBEH

Example 6:  logical buffer:   latinHEBREW1234|continue
            levels:           000001111112222200000000
            display:          latin1234|WERBEHcontinue
```
2. If the cursor level is equal to the level of the next character, the cursor must be displayed before this character ("before" is on the left for even levels, on the right for odd levels).
```
Example 1:  logical buffer:   latin|HEBREW
            levels:           000001111111
            display:          latinWERBEH|

Example 2:  logical buffer:   HEBREW|latin
            levels:           111111000000
            display:          WERBEH|latin

Example 3:  logical buffer:   latin|1234HEBREW
            levels:           0000022222111111
            display:          latinWERBEH|1234

Example 4:  logical buffer:   HEBREW|latin
            levels:           111111222222
            display:          |latinWERBEH

Example 5:  logical buffer:   latinHEBREW1234|MORE
            levels:           00000111111222211111
            display:          latinEROM|1234WERBEH

Example 6:  logical buffer:   latinHEBREW1234|continue
            levels:           000001111112222000000000
            display:          latin1234WERBEH|continue
```
3. If the cursor level is smaller than the levels of both previous and next characters, it must be displayed as if it was equal to the smaller of the two.
```
Example:    logical buffer:   HEBREW|latin
            levels:           111111022222
            display:          latin|WERBEH
```
4. If the cursor level is greater than the levels of both previous and next characters, it must be displayed as if it was equal to the greater of the two.
```
Example:    logical buffer:   HEBREW|latin
            levels:           111111322222
            display:          |latinWERBEH
```
5. If the cursor level is greater than the level of the previous character and smaller than the level of the next character and has the same parity (even or odd) as the previous character, it must be displayed after (to the right for even parity, to the left for odd parity) the previous character.
```
Example:    logical buffer:   HEBREW|latin
            levels:           111111344444
            display:          latin|WERBEH
```
6. If the cursor level is greater than the level of the previous character and smaller than the level of the next character and has the same parity (even or odd) as the next character, it must be displayed before (to the left for even parity, to the right for odd parity) the next character.
```
Example:    logical buffer:   HEBREW|latin
            levels:           111111244444
            display:          |latinWERBEH
```
7. If the cursor level is greater than the level of the previous character and smaller than the level of the next character, and if the cursor level is even and the levels of both previous and next characters are odd, then scan the text following the cursor position until (but not including) the first character with a level less than or equal to the cursor level. The cursor must be displayed to the left of the leftmost character in this range.
```
Example:    logical buffer:   HEBREW|MORE RTLlatin
            levels:           11111123333333322222
            display:          |LTR EROMlatinWERBEH
```
8. If the cursor level is greater than the level of the previous character and smaller than the level of the next character, and if the cursor level is odd and the levels of both previous and next characters are even, then scan the text following the cursor position until (but not including) the first character with a level less than or equal to the cursor level. The cursor must be displayed to the right of the rightmost character in this range.
```
Example:    logical buffer:   latin|more ltrHEBREW
            levels:           00000122222222111111
            display:          latinWERBEHmore ltr|
            
```
9. If the cursor level is smaller than the level of the previous character and greater than the level of the next character and has the same parity (even or odd) as the previous character, it must be displayed after (to the right for even parity, to the left for odd parity) the previous character.
```
Example:    logical buffer:   HEBREW|latin
            levels:           333333100000
            display:          |WERBEHlatin
```
10. If the cursor level is smaller than the level of the previous character and greater than the level of the next character and has the same parity (even or odd) as the next character, it must be displayed before (to the left for even parity, to the right for odd parity) the next character.
```
Example:    logical buffer:   textHEBREW|latin
            levels:           4444333333200000
            display:          WERBEHtext|latin
```
11. If the cursor level is smaller than the level of the previous character and greater than the level of the next character, and if the cursor level is even and the levels of both previous and next characters are odd, then scan the text preceding the cursor position until (but not including) the first character with a level less than or equal to the cursor level. The cursor must be displayed to the right of the rightmost character in this range.
```
Example:    logical buffer:   HEBREW|MORE RTLlatin
            levels:           33333321111111122222
            display:          latinLTR EROMWERBEH|
```
12. If the cursor level is smaller than the level of the previous character and greater than the level of the next character, and if the cursor level is odd and the levels of both previous and next characters are even, then scan the text preceding the cursor position until (but not including) the first character with a level less than or equal to the cursor level. The cursor must be displayed to the left of the leftmost character in this range.
```
Example:    logical buffer:   latin|more ltrHEBREW
            levels:           22222100000000111111
            display:          |latinmore ltrWERBEH
```

Note: these guidelines for positioning the cursor are rather complex to master, but fairly straighforward to implement. This document covers all possibilities, but all cases are not relevant in all situations. For instance, cases c5-c7 and c9-c11 can occur only if the text control implements means of specifying explicit embeddings.

25 Guidelines for Programming Editors

Under the name "Programming Editor", we consider a type of textual application used to write a formal language (like a programming language, or HTML) where most syntactic elements of the language (keywords, identifiers) are in English, while character values and comments may be in any language. This section defines what particular guidelines apply to programming editors.

Assumptions

We assume that all keywords and identifiers will be written with Latin characters. Only character literals and comments may be written in Arabic or Hebrew.

Paragraph embedding level

Each line of text is considered an independent paragraph. The paragraph embedding level is always 0.

Segmentation

Each character literal or comment containing RTL characters must be formatted independently. Syntactical delimiters must not be considered part of the literal or comment.

Example:     String strArray = { "abc de", "HEB 123", "REW fgh" };

In that case, at least 5 independent formatting operations are needed:

Segment 1:     String strArray = { "abc de", "
Segment 2:     HEB 123
Segment 3:     ", "
Segment 4:     REW fgh
Segment 5:     " };

The 5 segments must be layed out on the display line from left to right. The display will be:

             String strArray = { "abc de", "123 BEH", "fgh WER" };

Example of comment:     x = y;   /* THIS IS AN ASSIGNMENT.  */
Segment 1:     x = y;   /*
Segment 2:      THIS IS AN ASSIGNMENT.
Segment 3:     */

The display will be:    x = y;   /*  .TNEMNGISSA NA SI SIHT */

Base embedding level for segments containing RTL text

Files created by a programming editor are generally used as input to create an application which uses and displays the text strings defined in the source. The programming editor should display RTL segments as similarly as possible to how they will be displayed by the target application.

Whenever possible, the programming editor should try to infer what base embedding level will be used in the target application to display each segment of text, and use the same embedding level to display it in the programming editor itself.

An example where such an inference is valid is when the target application uses Java Swing controls for its UI. Java Swing controls use Contextual LTR orientation by default, so it is a good bet to use Contextual LTR orientation to display text segments within the editor.
If the programming editor environment does not support explicitly contextual orientation (which is the case for Windows), it is possible to simulate contextual orientation as follows:

scan the text until a "strong" character is found
if an RTL strong character was found, set the base embedding level for this segment to 1; otherwise, set it to 0.

26 Appendix: Alternative Guidelines for a Simplified Implementation

The following subsections describe alternative guidelines for some of the UI functions. They are not recommended. However, if implementors cannot implement the main guidelines and choose to follow alternative guidelines easier to implement, they must conform to the specifications in this appendix.

Logical Approach for Left and Right Arrows

The logical approach is sometimes preferred by implementors, mostly because its implementation is easier. This is not a recommended solution.

In the logical approach, the Left and Right Arrow act on the current position in the logical buffer. Their effect depends on the embedding level of the current paragraph.

When the paragraph embedding level is even (LTR paragraph), Left Arrow moves the current position one character backward in the logical buffer, Right Arrow moves the current position one character forward in the logical buffer.
When the paragraph embedding level is odd (RTL paragraph), Left Arrow moves the current position one character forward in the logical buffer, Right Arrow moves the current position one character backward in the logical buffer.

Moving over a character using the left or right arrow key must set the cursor Bidi level to the level of that character and the keyboard language to the most appropriate for that level.

After each such operation, the text cursor must be redisplayed according to the new logical position.

Logical Approach for Next/Previous Word

The logical approach is sometimes preferred by implementors, mostly because its implementation is easier. This is not a recommended solution.

In the logical approach, the Next/Previous Word functions act on the current position in the logical buffer. Next Word moves the current position forward in the logical buffer, Previous Word moves it backward. The visual effect depends on the embedding level of the current sequence of words.

Within a sequence of words with an even embedding level (LTR sequence), Next Word moves the cursor to the next word on the right, Previous Word moves the cursor to the next word on the left.
Within a sequence of word with an odd embedding level (RTL sequence), Next Word moves the cursor to the next word on the left, Previous Word moves the cursor to the next word on the right.

In all cases, the cursor must appear before the first character of the word, which means on its left for an even embedding level, and on its right for an odd embedding level.

The cursor level must be set to the level of the first logical character of the word.

The keyboard language should automatically change, if needed, to match the language of the word at the new position of the cursor: English for an LTR word, Arabic or Hebrew for an RTL word.

Simplified Conversion of cursor logical position to visual position

In some situations, a simplified implementation may be preferred, mainly to save implementation work, and is usually associated with the logical approach for the Left and Right Arrow keys. This is not a recommended solution.

Here are a few principles for such an implementation:

The cursor level depends only on the current position in the logical buffer.
The cursor level is equal to the level of the character preceding the current position.
If the current position is at the beginning of a line, the cursor level is equal to the level of the first character of the line.

From these principles, it derives that the cursor must be displayed after (to the right for even parity, to the left for odd parity) the character preceding the current position.

If the current position is at the beginning of a line, the cursor must be displayed before (to the left for even parity, to the right for odd parity) the first character of the line.

Example 1:  logical buffer:   latin|HEBREW
            levels:           000000111111
            display:          latin|WERBEH

Example 2:  logical buffer:   HEBREW|latin
            levels:           111111100000
            display:          |WERBEHlatin

Example 3:  logical buffer:   latinHEBREW|
            levels:           000001111111
            display:          latin|WERBEH

Example 4:  logical buffer:   latinHEBREW1234|
            levels:           0000011111122222
            display:          latin1234|WERBEH

Example 5:  logical buffer:   |HEBREWlatin
            levels:           111111100000
            display:          WERBEH|latin

Example 6:  logical buffer:   latin|HEBREW
            levels:           222222111111
            display:          WERBEHlatin|

Example 7:  logical buffer:   HEBREW|latin
            levels:           111111122222
            display:          latin|WERBEH

Example 8:  logical buffer:   HEBREWlatin|
            levels:           111111222222
            display:          latin|WERBEH

Example 9:  logical buffer:   |latinHEBREW
            levels:           222222111111
            display:          WERBEH|latin