This is a Python port of Text::Unidecode Perl module by Sean M. See also theįrequently Asked Questions section for more info on common problems. Strings that are directly visible to users of your application. Transliteration for a different language. A userĮxpects a character to be transliterated in their language but Unidecode uses a MostĬommon examples include characters that are used in multiple languages. Note that some people might find certain transliterations offending. On hand-tuned character mappings that for example also contain ASCIIĪpproximations for symbols and non-Latin alphabets. Generally Unidecode produces better results than simply stripping accents fromĬharacters (which can be done in Python with built-in functions). The script you are transliterating is from Latin alphabet, the worse the So a good rule of thumb is that the further It draws the line at context-freeĬharacter-by-character mapping. Transliteration (i.e., conveying, in Roman letters, the pronunciationĮxpressed by the text in some other writing system) of languages likeĬhinese, Japanese or Korean is a very complex issue and this library does Western origin it should be between perfect and good. The quality of resulting ASCII representation varies. Near what a human with a US keyboard would choose. Universally displayable characters between 0x00 and 0x7F), where theĬompromises taken when mapping between two character sets are chosen to be Unicode data and tries to represent it in ASCII characters (i.e., the What Unidecode provides is a middle road: the function unidecode() takes Nearly useless to someone who actually wants to read what the text says. In most of examples listed above you could represent Unicode characters as Of this README before using Unidecode in your project. There are a number of caveats that come with its use,Įspecially when its output is directly visible to users. Unidecode is not a replacement for fully supporting Unicode for strings in Of this is when making an URL slug from an article title. Unicode strings that should still be somewhat intelligible. Keyboard, or when constructing ASCII machine identifiers from human-readable For example when integrating with legacy code thatĭoesn’t support Unicode, or for ease of entry of non-Roman names on a US It often happens that you have text data in Unicode, but you need to
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |