"युनिकोड" का संशोधनहरू बिचको अन्तर
Content deleted Content added
सा r2.7.2+) (रोबोट ले परिवर्तन गर्दै: ilo:Unicode |
सा हिज्जे मिलाउँदै |
||
पङ्क्ति ६:
== Origin र development ==
It is the explicit aim of युनिकोड to transcend the limitations of traditional [[:en:character encoding|character encoding]]s such as those defined by the [[:en:ISO 8859|ISO 8859]] standard, which are usedमा the various countries of the world, but are largely incompatible with each other.
युनिकोडमा intent encodes the underlying characters र not variant [[:en:glyph|glyph]]s for such characters.
युनिकोड's roleमा text-processing is to provide a unique code point — not a glyph — for each character. In other words, युनिकोड is used to represent a characterमा an abstract way, र leaves the visual rendering (size, shape or style) to another program, such as a [[:en:web browser|वेब ब्राउजर]] or [[:en:word processor|word processor]].
पङ्क्ति १६:
The युनिकोड standard also includes a number of related items, such as character properties, text normalisation forms, र bidirectional display order (for the correct display of text containing both right-to-left scripts, such as [[:en:Arabic|अरबी]] or [[:en:Hebrew language|हेब्रु]], र left-to-right scripts).
In [[1997]] a proposal was made by [[:en:Michael Everson|माइकल इभरसन्]] to encode the characters of the [[Klingon language]]मा Plane 1 of [[ISO 10646|ISO/IEC 10646-2]]. The proposal was rejectedमा [[2001]] as "inappropriate for encoding" — not because the proposal was technically faulty, but because users of Klingon normally read र write र exchange dataमा [[Latin]] transliteration. The [[Elves (Middle-earth)|elvish]] scripts [[Tengwar]] र [[Cirth]] from [[J. R. R. Tolkien]]'s [[Middle-earth]] setting were proposed for inclusionमा Plane 1मा [[1993]].
== Mapping र encodings ==
पङ्क्ति ३६:
* [[2003]] युनिकोड 4.0
* [[2005]] युनिकोड 4.1
gd:sf/ xl/x/ hL, tkfO{sf]
=== Storage transfer र processing ===
पङ्क्ति ४६:
The mapping methods are called the UTF (युनिकोड Transformation Format) र UCS (Universal Character Set) encodings. Among them are [[UTF-32]], [[UCS-4]], [[UTF-16]], [[UCS-2]], [[UTF-8]], [[UTF-EBCDIC]] र [[UTF-7]]. The numbers indicate the number of bitsमा one unit, for UTF encodings, or bytes, for UCS encodings. In UTF-32 or UCS-4, one unit is enough for any character;मा the other cases, a variable number of units is used for each character. UTF-8 is the de-facto standard encoding for interchange of युनिकोड text with UTF-16 र UTF-32 being used mainly for internal processing.
The युनिकोड [[Byte Order Mark|byte order mark]] (BOM) is specified for use at the beginnings of text filesमा UCS-2 र UTF-16 encodings. It has been adopted by some software developers for other encodings, including UTF-8, which does not need an indication of byte order. In this case it is an attempt to mark the file as containing युनिकोड text. The BOM is code point <code>U+FEFF</code>, which has the important property of being unambiguously interpretable regardless of which युनिकोड encoding is used.
{{See also|Mapping of युनिकोड characters}}
पङ्क्ति ५८:
The [[CJK]] ideographs currently are encoded onlyमा their precomposed form. Still, most of those ideographs are evidently made up of simpler elements, soमा principle it would be possible to decompose them just as it is done with [[Hangul]]. This would greatly reduce the number of required codepoints, while allowing the display of virtually every conceivable ideograph (and so doing away with all problems of the [[Han unification]]). A similar idea is used for some [[input method]]s, such as [[Cangjie method|Cangjie]] र [[Wubi method|Wubi]]. However, attempts to do this for character encoding have stumbled over the fact that ideographs are not as simply decomposed or as regular as they seem.
Combining marks, like the complex script shaping required to properly render [[Arabic]] text र many other scripts, are usually dependent on complex font technologies, like [[OpenType]] (by Adobe र [[Microsoft]]), Graphite (by [[SIL International]]), र [[Apple Advanced Typography|AAT]] (by [[Apple Computer|Apple]]), by which a font designer includes instructionsमा a font telling software how to properly output different character sequences.
[[As of 2004]], most software still cannot reliably handle many features not supported by older font formats, so combining characters generally will not work correctly. Hypothetically, {{युनिकोड|ḗ}} (precomposed e with macron र acute above) र {{युनिकोड|ḗ}} (e followed by the combining macron above र combining acute above) are identicalमा appearance, both giving an [[e]] with [[macron]] र [[acute accent]], but appearance can vary greatly across software applications.
पङ्क्ति ७७:
=== Operating systems ===
Despite technical problems र limitations र criticism on process, युनिकोड has emerged as the dominant encoding scheme.
=== E-mail ===
[[MIME]] defines two different mechanisms for encoding non-ASCII charactersमा [[इमेल|e-mail]], depending on whether the characters areमा e-mail headers such as the "Subject:" orमा the text body of the message.
The adoption of युनिकोडमा [[इमेल|e-mail]] has been very slow. Most East-Asian text is still encodedमा a local encoding such as [[Shift-JIS]], र many commonly used e-mail programs still cannot handle युनिकोड data correctly, if they have some support at all. This situation is not expected to changeमा the foreseeable future.
पङ्क्ति १११:
Word 2003 also allows for entering युनिकोड characters by spelling out the code first, e.g. 014B for the 'ng'-symbol र then hitting 'Alt' plus 'X' to substitute the string to the left by its युनिकोड character.
Macintosh users have a similar feature with an input method called 'Unicode Hex Input',मा [[Mac OS X]] रमा [[Mac OS]] 8.5 र later: hold down the Option key, र type the four-hex-digit युनिकोड code point. Handling of code-points above 0xFFFF is done by entering a [[UTF-16|surrogate pair]]; they will be converted into a single character automatically.
[[GNOME|Gnome2]] follows [[ISO 14755]]. Hold down Ctrl र Shift र enter the hexadecimal युनिकोड value.
पङ्क्ति १२९:
* [http://www.decodeunicode.org/ DecodeUnicode - Unicode Wiki, 50.000 gifs र information about each character]
* [http://www.cl.cam.ac.uk/~mgk25/ucs/examples/ Example text files using Unicode]
* [http://www.lazytools.com/unicode-ascii/
* [[Michael Everson]]'s [http://www.unicode.org/notes/tn4/everson-iuc21pap.pdf "
* [http://www.evertype.com/standards/csur/ ConScript Unicode Registry] a project to standardize part of the Private Use Area for use with [[artificial script]]s र artificial languages.
* [http://www-106.ibm.com/developerworks/unicode/library/u-secret.html The secret life of Unicode] "A peek at Unicode's soft underbelly" Describes problems requiring resolution.
* Tim Bray's [http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF Characters vs Bytes] explains how the different encodings work.
* [http://www.joelonsoftware.com/articles/Unicode.html The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode र Character Sets (No Excuses!)] by [[Joel Spolsky]]
|