"युनिकोड" का संशोधनहरू बिचको अन्तर

सा r2.7.2+) (रोबोट ले परिवर्तन गर्दै: ilo:Unicode
सा हिज्जे मिलाउँदै
पङ्क्ति ६:
== Origin र development ==
 
It is the explicit aim of युनिकोड to transcend the limitations of traditional [[:en:character encoding|character encoding]]s such as those defined by the [[:en:ISO 8859|ISO 8859]] standard, which are usedमा the various countries of the world, but are largely incompatible with each other. One problem with traditional character encodings is that they allow for [[:en:bilingual|bilingual]] कम्प्युटर processing (usually [[:en:Roman character|रोमन अक्षर]]हरू र the local language), but not for multilingual कम्प्युटर processing (कम्प्युटर processing of arbitrary languages mixed with each other).
 
युनिकोडमा intent encodes the underlying characters र not variant [[:en:glyph|glyph]]s for such characters. In the case of [[:en:Chinese character‍‍|चाइनिज अक्षरहरू]]s, this sometimes leads to controversies over what is the underlying character र what is the variant glyph (see [[:en:Han unification|Han unification]]).
 
युनिकोड's roleमा text-processing is to provide a unique code point — not a glyph — for each character. In other words, युनिकोड is used to represent a characterमा an abstract way, र leaves the visual rendering (size, shape or style) to another program, such as a [[:en:web browser|वेब ब्राउजर]] or [[:en:word processor|word processor]].
पङ्क्ति १६:
The युनिकोड standard also includes a number of related items, such as character properties, text normalisation forms, र bidirectional display order (for the correct display of text containing both right-to-left scripts, such as [[:en:Arabic‍‍|अरबी]] or [[:en:Hebrew language|हेब्रु]], र left-to-right scripts).
 
In [[1997]] a proposal was made by [[:en:Michael Everson|माइकल इभरसन्]] to encode the characters of the [[Klingon language]]मा Plane 1 of [[ISO 10646|ISO/IEC 10646-2]]. The proposal was rejectedमा [[2001]] as "inappropriate for encoding" — not because the proposal was technically faulty, but because users of Klingon normally read र write र exchange dataमा [[Latin]] transliteration. The [[Elves (Middle-earth)|elvish]] scripts [[Tengwar]] र [[Cirth]] from [[J. R. R. Tolkien]]'s [[Middle-earth]] setting were proposed for inclusionमा Plane 1मा [[1993]]. The draft was withdrawn to incorporate changes suggested by [[Tolkienist]]s, र is as of [[2004]] still under consideration.
 
== Mapping र encodings ==
पङ्क्ति ३६:
* [[2003]] युनिकोड 4.0
* [[2005]] युनिकोड 4.1
gd:sf/ xl/x/ hL, tkfO{sf] cGtjftf{ ;'g]kl5 dnfO{ klg cfkm\g} af/]df s]xL hfGg] lh1f;f eof] cfzf 5 tkfO{sf] ;xL ;'emfax? kfpFg]5' .d]/f] ;Dk"0f{ 8s'd]G6sf]] gfd sdnf /fO{ xf] t/ Hof]lt;zf:q cg';f/ 8]lg;f / d]/f] hGdldlt @)#(.!.!* zlgaf/ a]n'sf * ah]kl5 ( ah]sf] aLrdf xf] . d]/f] gfd d]/f] nflu sltsf] kmfkg] vfnsf] 5 ;fy} d}n] jt{dfg d]/f] l:yltaf6 ;'wf/ x'gsf] nflu s] s;/L ubf{ pko'Qm x'G5 eljiosf] af/]df atfO{lbg'xf]nf . {{अनुवाद}}
 
=== Storage transfer र processing ===
पङ्क्ति ४६:
The mapping methods are called the UTF (युनिकोड Transformation Format) र UCS (Universal Character Set) encodings. Among them are [[UTF-32]], [[UCS-4]], [[UTF-16]], [[UCS-2]], [[UTF-8]], [[UTF-EBCDIC]] र [[UTF-7]]. The numbers indicate the number of bitsमा one unit, for UTF encodings, or bytes, for UCS encodings. In UTF-32 or UCS-4, one unit is enough for any character;मा the other cases, a variable number of units is used for each character. UTF-8 is the de-facto standard encoding for interchange of युनिकोड text with UTF-16 र UTF-32 being used mainly for internal processing.
 
The युनिकोड [[Byte Order Mark|byte order mark]] (BOM) is specified for use at the beginnings of text filesमा UCS-2 र UTF-16 encodings. It has been adopted by some software developers for other encodings, including UTF-8, which does not need an indication of byte order. In this case it is an attempt to mark the file as containing युनिकोड text. The BOM is code point <code>U+FEFF</code>, which has the important property of being unambiguously interpretable regardless of which युनिकोड encoding is used. The units <code>FE</code> र <code>FF</code> never appearमा [[UTF-8]], <code>U+FFFE</code> (the result of byte-swapping <code>U+FEFF</code>) is not a legal character, र <code>U+FEFF</code> is the Zero-Width No-Break Space (a character with no appearance र no effect other than preventing formation of [[ligature (typography)|ligatures]]). The same character converted to UTF-8 becomes the byte sequence <code>EF BB BF</code>.
 
{{See also|Mapping of युनिकोड characters}}
पङ्क्ति ५८:
The [[CJK]] ideographs currently are encoded onlyमा their precomposed form. Still, most of those ideographs are evidently made up of simpler elements, soमा principle it would be possible to decompose them just as it is done with [[Hangul]]. This would greatly reduce the number of required codepoints, while allowing the display of virtually every conceivable ideograph (and so doing away with all problems of the [[Han unification]]). A similar idea is used for some [[input method]]s, such as [[Cangjie method|Cangjie]] र [[Wubi method|Wubi]]. However, attempts to do this for character encoding have stumbled over the fact that ideographs are not as simply decomposed or as regular as they seem.
 
Combining marks, like the complex script shaping required to properly render [[Arabic]] text र many other scripts, are usually dependent on complex font technologies, like [[OpenType]] (by Adobe र [[Microsoft]]), Graphite (by [[SIL International]]), र [[Apple Advanced Typography|AAT]] (by [[Apple Computer|Apple]]), by which a font designer includes instructionsमा a font telling software how to properly output different character sequences. Another method sometimes employedमा [[fixed-width]] fonts is to place the combining mark's glyph before its own left [[sidebearing]]; this method, however, only works for some diacritics र stacking will not occur properly.
 
[[As of 2004]], most software still cannot reliably handle many features not supported by older font formats, so combining characters generally will not work correctly. Hypothetically, {{युनिकोड|ḗ}} (precomposed e with macron र acute above) र {{युनिकोड|ḗ}} (e followed by the combining macron above र combining acute above) are identicalमा appearance, both giving an [[e]] with [[macron]] र [[acute accent]], but appearance can vary greatly across software applications.
पङ्क्ति ७७:
=== Operating systems ===
 
Despite technical problems र limitations र criticism on process, युनिकोड has emerged as the dominant encoding scheme. [[Windows NT]] र its descendants [[Windows 2000]] र [[Windows XP]] make extensive use of [[UTF-16]] as an internal representation of text. UNIX-like operating systems such as [[GNU/Linux]], [[BSD]] र [[Mac OS X]] have adopted [[UTF-8]], as the basis of representation of [[multilingual text]].
 
=== E-mail ===
 
[[MIME]] defines two different mechanisms for encoding non-ASCII charactersमा [[इमेल|e-mail]], depending on whether the characters areमा e-mail headers such as the "Subject:" orमा the text body of the message. In both cases, the original character set is identified as well as a transfer encoding. For e-mail transmission of युनिकोड the UTF-8 character set र the [[Base64]] transfer encoding are recommended. The details of the two different mechanisms are specifiedमा the MIME standards र are generally hidden from users of e-mail software.
 
The adoption of युनिकोडमा [[इमेल|e-mail]] has been very slow. Most East-Asian text is still encodedमा a local encoding such as [[Shift-JIS]], र many commonly used e-mail programs still cannot handle युनिकोड data correctly, if they have some support at all. This situation is not expected to changeमा the foreseeable future.
पङ्क्ति १११:
Word 2003 also allows for entering युनिकोड characters by spelling out the code first, e.g. 014B for the 'ng'-symbol र then hitting 'Alt' plus 'X' to substitute the string to the left by its युनिकोड character.
 
Macintosh users have a similar feature with an input method called 'Unicode Hex Input',मा [[Mac OS X]] रमा [[Mac OS]] 8.5 र later: hold down the Option key, र type the four-hex-digit युनिकोड code point. Handling of code-points above 0xFFFF is done by entering a [[UTF-16|surrogate pair]]; they will be converted into a single character automatically. Mac OS X (version 10.2 र newer) also has a 'Character Palette', which allows users to visually select any युनिकोड character from a table organized numerically, by युनिकोड block, or by a selected font's available characters.
 
[[GNOME|Gnome2]] follows [[ISO 14755]]. Hold down Ctrl र Shift र enter the hexadecimal युनिकोड value.
पङ्क्ति १२९:
* [http://www.decodeunicode.org/ DecodeUnicode - Unicode Wiki, 50.000 gifs र information about each character]
* [http://www.cl.cam.ac.uk/~mgk25/ucs/examples/ Example text files using Unicode]
* [http://www.lazytools.com/unicode-ascii/ Unicode special character map] is similar to the Windows version. Click a symbol to obtain either the named or numeric code for HTML.
* [[Michael Everson]]'s [http://www.unicode.org/notes/tn4/everson-iuc21pap.pdf "Leaks माLeaksमा the Unicode pipeline: script, script, script…"] PDF 2MB
* [http://www.evertype.com/standards/csur/ ConScript Unicode Registry] a project to standardize part of the Private Use Area for use with [[artificial script]]s र artificial languages. An explanation of how to propose character namesमा Unicode is available here.
* [http://www-106.ibm.com/developerworks/unicode/library/u-secret.html The secret life of Unicode] "A peek at Unicode's soft underbelly" Describes problems requiring resolution. Includes links to Unicode resources.
* Tim Bray's [http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF Characters vs Bytes] explains how the different encodings work.
* [http://www.joelonsoftware.com/articles/Unicode.html The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode र Character Sets (No Excuses!)] by [[Joel Spolsky]]
"https://ne.wikipedia.org/wiki/युनिकोड" बाट अनुप्रेषित