To: From: Philippe Verdy verdy_p(at)wanadoo.fr Cc: Received: (qmail 16616 invoked by uid 0); 10 Jul 2003 13:10:30 -0000 from unicode.org (209.235.17.55) by ns.need.bg with SMTP; 10 Jul 2003 13:10:30 -0000 from sarasvati.unicode.org (localhost.localdomain [127.0.0.1]) by unicode.org (8.11.6/8.11.6) with ESMTP id h6AD4Ss09305; Thu, 10 Jul 2003 09:04:28 -0400 with ECARTIS (v1.0.0; list unicode); Thu, 10 Jul 2003 09:04:27 -0400 (EDT) from mwinf0603.wanadoo.fr (smtp3.wanadoo.fr [193.252.22.25]) by unicode.org (8.11.6/8.11.6) with ESMTP id h6AD4Ps09298 for ; Thu, 10 Jul 2003 09:04:25 -0400 from hppve (APuteaux-111-1-2-90.w80-14.abo.wanadoo.fr [80.14.19.90]) by mwinf0603.wanadoo.fr (SMTP Server) with SMTP id 928BC2400293; Thu, 10 Jul 2003 15:04:18 +0200 (CEST) Date: Thu, 10 Jul 2003 15:04:56 +0200 XMailer: Microsoft Outlook Express 6.00.2800.1158 MIMEVersion: 1.0 ContentType: text/plain; charset=windows-1251 XPriority: Subject: Re: Combining diacriticals and Cyrillic Body: On Thursday, July 10, 2003 10:24 AM, vladimirg(at)need.bg wrote: > Dear Ladys and Gentlemen, > > Currently there is an ongoing effort in Bulgaria trying to resolve an > issuie concerning the way we write in Bulgarian. > > Our problem is: > > Usually a bulgarian regular user does not need to write accented > characters. There is one middle-sized exclusion of this, but > generally we do fine without accented characters. The problem is that > in some special cases or more serious lingustic work, one definetely > needs to be able to write accented characters (accented vowels). > > One of the ideas is to invent a new ASCII-based encodings, containing > the accented characters we need. This would introduce an additional > disorder in the current mess of cyrillic encodings, and would > introduce problems with automated spellcheck. > > Generally I beleive it would be best to invent a Unicode based > solution. > > Such a solution is for example, combining diacritical signs with the > cyrillic symbols. > > I composed a demo page: > http://v.bulport.com/bugs/opera/426/balhaah_lonex_org/ > > and then made 10-20 shots of the results on Opera and IE on Linux, > Windows 98 and Windows XP: > http://v.bulport.com/bugs/opera/426/balhaah_lonex_org/shots.html > > You can see that this approach yields _quite_ incosistent and useless > results, depending on the font, application and operating system > being used. On Windows XP, there's no incorrect rendering. However the best rendering comes with Arial MS Unicode, which is part of Office, bit not part of Windows XP or Internet Explorer fonts. The other named fonts are much less common and require an explicit installation by the user. The effective font then becomes sans-serif, normally bound in the user settings to Arial (by default on Windows): the result is correct, with the right grave accents used, but the rendering is poor, as they are not handled in Arial by ligating the combining sequence in a specially prepared and ligated glyph, but simply as a separate non spacing accent, displayed a bit too high above the ascent line, and not centered on the previous character. The reason for it is that Arial, /not Arial MS Unicode/, does not contain placement hints for each combining class of diacritics in the definition of base characters, but diacritics are only using an approximate relative positioning in a non-spacing glyph, with a single relative offset adjusted to work on most Latin letters (the Arial TrueType font does not include any advanced OpenType tables for positioning of pairs of glyphs). However, this text rendered with Arial is still readable and correct according to Unicode, just poorly rendered. Note that the effective version of these fonts is important: the Arial font provided with Windows 95 is TrueType only (there's no OpenType font support in W95, and the UniScribe engine is only provided as a supplement for Internet Explorer 5+, and is not used by Netscape 4 and probably other browsers as well)... On Windows XP, the usage of UniScribe and its support of OpenType fonts is transparent to most applications (integrated within most GDI primitives, and USER32 GUI components). So the difference is not much between browsers, but between OS versions (and localization for older OSes).