String str = &akh va quot shrine;420&akh va quot shrine; + 42; 为什么不能把42看做asc码而是要看成&akh va quot shrine;42&akh va quot shrine;

*********** temporary message space *******
There is no fate that cannot be surmounted by scorn.Albert Camus (1913 - 1960)I have never taken any exercise except sleeping and resting.Mark Twain (1835 - 1910)
ASCII & Unicode Character Codes for HTML, C#, VBScript, VB.NET, PHP & JavaScript
Character encoding is useful in web development for displaying non-standard characters,
generating character strings from numbers (CAPTCHA images for instance),
and many other purposes. Unicode is an industry standard developed from the earlier
ASCII standard (which is now a subset of Unicode).
To display Unicode (ASCII) characters in HTML use &#XXX; in C# use Convert.ToChar(XXX), in VBScript, VB.NET
and PHP use the chr(XXX) function, in JavaScript use String.fromCharCode(XXX), where XXX is the entity number.
ASCII codes were originially developed for teletype machines and the first 32 characters
are non-printing (bell, backspace, etc.)
(updated daily from
ASP & ASP.NET
Certification
E-Commerce
Interface Design
PHP & MySQL
Web Design
Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
Cracking the Coding Interview: 189 Programming Questions and Solutions
Cracking the Coding Interview: 189 Programming Questions and Solutions
Deep Learning (Adaptive Computation and Machine Learning series)
Deep Learning (Adaptive Computation and Machine Learning series)
Life 3.0: Being Human in the Age of Artificial Intelligence
Life 3.0: Being Human in the Age of Artificial Intelligence
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
iGen: Why Today’s Super-Connected Kids Are Growing Up Less Rebellious, More Tolerant, Less Happy--and Completely Unprepared for Adulthood--and What That Means for the Rest of Us
iGen: Why Today’s Super-Connected Kids Are Growing Up Less Rebellious, More Tolerant, Less Happy--and Completely Unprepared for Adulthood--and What That Means for the Rest of Us
Clean Architecture: A Craftsman's Guide to Software Structure and Design (Robert C. Martin Series)
Clean Architecture: A Craftsman's Guide to Software Structure and Design (Robert C. Martin Series)
CompTIA A+ Certification All-in-One Exam Guide, Ninth Edition (Exams 220-901 & 220-902)
CompTIA A+ Certification All-in-One Exam Guide, Ninth Edition (Exams 220-901 & 220-902)
The Four: The Hidden DNA of Amazon, Apple, Facebook, and Google
The Four: The Hidden DNA of Amazon, Apple, Facebook, and Google
Python Crash Course: A Hands-On, Project-Based Introduction to Programming
Python Crash Course: A Hands-On, Project-Based Introduction to Programming
10/4/:08 PMPython字符串的encode与decode研究心得乱码问题解决方法
字体:[ ] 类型:转载 时间:
为什么Python使用过程中会出现各式各样的乱码问题,明明是中文字符却显示成“\xe4\xb8\xad\xe6\x96\x87”的形式?
为什么会报错“UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)”?本文就来研究一下这个问题。
字符串在Python内部的表示是unicode编码,因此,在做编码转换时,通常需要以unicode作为中间编码,即先将其他编码的字符串解码(decode)成unicode,再从unicode编码(encode)成另一种编码。&
decode的作用是将其他编码的字符串转换成unicode编码,如str1.decode('gb2312'),表示将gb2312编码的字符串str1转换成unicode编码。&
encode的作用是将unicode编码转换成其他编码的字符串,如str2.encode('gb2312'),表示将unicode编码的字符串str2转换成gb2312编码。&
因此,转码的时候一定要先搞明白,字符串str是什么编码,然后decode成unicode,然后再encode成其他编码
代码中字符串的默认编码与代码文件本身的编码一致。&
如:s='中文'
如果是在utf8的文件中,该字符串就是utf8编码,如果是在gb2312的文件中,则其编码为gb2312。这种情况下,要进行编码转换,都需要先用decode方法将其转换成unicode编码,再使用encode方法将其转换成其他编码。通常,在没有指定特定的编码方式时,都是使用的系统默认编码创建的代码文件。&
如果字符串是这样定义:s=u'中文'
则该字符串的编码就被指定为unicode了,即python的内部编码,而与代码文件本身的编码无关。因此,对于这种情况做编码转换,只需要直接使用encode方法将其转换成指定编码即可。
如果一个字符串已经是unicode了,再进行解码则将出错,因此通常要对其编码方式是否为unicode进行判断:
isinstance(s, unicode)& #用来判断是否为unicode&
用非unicode编码形式的str来encode会报错&
&如何获得系统的默认编码?&
#!/usr/bin/env python#coding=utf-8import sysprint sys.getdefaultencoding()&&
该段程序在英文WindowsXP上输出为:ascii&
在某些IDE中,字符串的输出总是出现乱码,甚至错误,其实是由于IDE的结果输出控制台自身不能显示字符串的编码,而不是程序本身的问题。&
如在UliPad中运行如下代码:
s=u"中文"print s&
会提示:UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)。这是因为UliPad在英文WindowsXP上的控制台信息输出窗口是按照ascii编码输出的(英文系统的默认编码是ascii),而上面代码中的字符串是Unicode编码的,所以输出时产生了错误。
将最后一句改为:print s.encode('gb2312')
则能正确输出“中文”两个字。
若最后一句改为:print s.encode('utf8')
则输出:\xe4\xb8\xad\xe6\x96\x87,这是控制台信息输出窗口按照ascii编码输出utf8编码的字符串的结果。
unicode(str,'gb2312')与str.decode('gb2312')是一样的,都是将gb2312编码的str转为unicode编码&
使用str.__class__可以查看str的编码形式
原理说了半天,最后来个包治百病的吧:) 代码如下:#!/usr/bin/env python #coding=utf-8 s="中文" if isinstance(s, unicode): #s=u"中文" print s.encode('gb2312') else: #s="中文" print s.decode('utf-8').encode('gb2312')
您可能感兴趣的文章:
大家感兴趣的内容
12345678910
最近更新的内容
常用在线小工具PostgreSQL: Documentation: 9.2: String Functions and Operators
This page in other versions:
&|& Development versions:
&|& Unsupported versions:
This section describes functions and operators for examining
and manipulating string values. Strings in this context include
values of the types character, character varying, and text.
Unless otherwise noted, all of the functions listed below work on
all of these types, but be wary of potential effects of automatic
space-padding when using the character
type. Some functions also exist natively for the bit-string
SQL defines some string
functions that use key words, rather than commas, to separate
arguments. Details are in .
PostgreSQL also provides
versions of these functions that use the regular function
invocation syntax (see ).
Note: Before PostgreSQL 8.3, these functions would
silently accept values of several non-string data types as
well, due to the presence of implicit coercions from those
data types to text. Those coercions
have been removed because they frequently caused surprising
behaviors. However, the string concatenation operator
(||) still accepts non-string input,
so long as at least one input is of a string type, as shown
in . For other cases, insert an explicit coercion to
text if you need to duplicate the
previous behavior.
Table 9-6. SQL
String Functions and Operators
Return Type
Description
String concatenation
'Post' || 'greSQL'
PostgreSQL
|| non-string or non-string
String concatenation with one non-string input
'Value: ' || 42
bit_length(string)
Number of bits in string
bit_length('jose')
char_length(string) or character_length(string)
Number of characters in string
char_length('jose')
lower(string)
Convert string to lower case
lower('TOM')
octet_length(string)
Number of bytes in string
octet_length('jose')
overlay(string
placing string from int [for int])
Replace substring
overlay('Txxxxas' placing 'hom'
from 2 for 4)
position(substring
in string)
Location of specified substring
position('om' in
substring(string
[from int] [for
Extract substring
substring('Thomas' from 2 for
substring(string
from pattern)
Extract substring matching POSIX regular expression.
more information on pattern matching.
substring('Thomas' from
substring(string
from pattern for
Extract substring matching SQL regular expression. See
information on pattern matching.
substring('Thomas' from
'%#"o_a#"_' for '#')
trim([leading |
trailing | both] [characters]
from string)
Remove the longest string containing only characters
from characters (a space by
default) from the start, end, or both ends (both is the default) of string
trim(both 'xyz' from
'yxTomxx')
upper(string)
Convert string to upper case
upper('tom')
Additional string manipulation functions are available and are
listed in .
Some of them are used internally to implement the SQL-standard string functions listed in
Table 9-7. Other String Functions
Return Type
Description
ascii(string)
ASCII code of the
first character of the argument. For UTF8 returns the Unicode code point
of the character. For other multibyte encodings, the
argument must be an ASCII character.
ascii('x')
btrim(string
characters text])
Remove the longest string consisting only of
characters in characters (a
space by default) from the start and end of string
btrim('xyxtrimyyx',
Character with the given code. For UTF8 the argument is treated as a
Unicode code point. For other multibyte encodings the
argument must designate an ASCII character. The NULL (0)
character is not allowed because text data types cannot
store such bytes.
concat(str
"any" [, str "any" [, ...]
Concatenate all arguments. NULL arguments are
concat('abcde', 2, NULL,
concat_ws(sep
text, str "any" [,
str "any" [, ...] ])
Concatenate all but first arguments with separators.
The first parameter is used as a separator. NULL
arguments are ignored.
concat_ws(',', 'abcde', 2, NULL,
abcde,2,22
convert(string
bytea, src_encoding name,
dest_encoding name)
Convert string to dest_encoding. The original encoding is
specified by src_encoding. The
string must be valid in this
encoding. Conversions can be defined by CREATE CONVERSION. Also there are some
predefined conversions. See
for available conversions.
convert('text_in_utf8', 'UTF8',
text_in_utf8 represented in
Latin-1 encoding (ISO 8859-1)
convert_from(string
bytea, src_encoding name)
Convert string to the database encoding. The original
encoding is specified by src_encoding. The string must be valid in this
convert_from('text_in_utf8',
text_in_utf8 represented in
the current database encoding
convert_to(string
text, dest_encoding name)
Convert string to dest_encoding.
convert_to('some text',
some text represented in the
UTF8 encoding
decode(string
text, format text)
Decode binary data from textual representation in
string. Options for format are same as in encode.
decode('MTIzAAE=',
encode(data
bytea, format text)
Encode binary data into a textual representation.
Supported formats are: base64,
hex, escape. escape
converts zero bytes and high-bit-set bytes to octal
sequences (\nnn) and doubles backslashes.
encode(E'123\\000\\001',
format(formatstr text [,
str "any" [, ...] ])
Format a string. This function is similar to the C
function sprintf; but only
the following conversion specifications are recognized:
%s interpolates the
corresponding
%I escapes its argument as an SQL
%L escapes its
argument as an SQL %%
outputs a literal %. A
conversion can reference an explicit parameter position
by preceding the conversion specifier with n$, where
n is the argument
position. See also .
format('Hello %s, %1$s',
Hello World, World
initcap(string)
Convert the first letter of each word to upper case
and the rest to lower case. Words are sequences of
alphanumeric characters separated by non-alphanumeric
characters.
initcap('hi THOMAS')
left(str text, n int)
Return first n
characters in the string. When n is negative, return all but last
|n| characters.
left('abcde', 2)
length(string)
Number of characters in string
length('jose')
length(string
bytea, encoding name
Number of characters in string in the given encoding. The string must be valid in this
length('jose', 'UTF8')
lpad(string
text, length int
Fill up the string to
length length by prepending
the characters fill (a space
by default). If the string is
already longer than length
then it is truncated (on the right).
lpad('hi', 5, 'xy')
ltrim(string
characters text])
Remove the longest string containing only characters
from characters (a space by
default) from the start of string
ltrim('zzzytest',
md5(string)
Calculates the MD5 hash of string, returning the result in
hexadecimal
md5('abc')
pg_client_encoding()
Current client encoding name
pg_client_encoding()
quote_ident(string
Return the given string suitably quoted to be used as
an identifier in an SQL statement string. Quotes are
added only if necessary (i.e., if the string contains
non-identifier characters or would be case-folded).
Embedded quotes are properly doubled. See also .
quote_ident('Foo bar')
quote_literal(string text)
Return the given string suitably quoted to be used as
a string literal in an SQL statement string. Embedded
single-quotes and backslashes are properly doubled. Note
that quote_literal returns
if the argument might be null,
quote_nullable is often
more suitable. See also .
quote_literal(E'O\'Reilly')
'O''Reilly'
quote_literal(value
anyelement)
Coerce the given value to text and then quote it as a
literal. Embedded single-quotes and backslashes are
properly doubled.
quote_literal(42.5)
quote_nullable(string text)
Return the given string suitably quoted to be used as
a string literal in an SQL or, if the
argument is null, return NULL.
Embedded single-quotes and backslashes are properly
doubled. See also .
quote_nullable(NULL)
quote_nullable(value anyelement)
Coerce the given value to text and then quote it as a
or, if the argument is null, return NULL. Embedded single-quotes and
backslashes are properly doubled.
quote_nullable(42.5)
regexp_matches(string text,
pattern text [, flags
setof text[]
Return all captured substrings resulting from
matching a POSIX regular expression against the
string. See
for more information.
regexp_matches('foobarbequebaz',
'(bar)(beque)')
{bar,beque}
regexp_replace(string text,
pattern text, replacement
text [, flags text])
Replace substring(s) matching a POSIX regular
expression. See
for more information.
regexp_replace('Thomas',
'.[mN]a.', 'M')
regexp_split_to_array(string text,
pattern text [, flags
Split string using a POSIX
regular expression as the delimiter. See
for more information.
regexp_split_to_array('hello
world', E'\\s+')
{hello,world}
regexp_split_to_table(string text,
pattern text [, flags
setof text
Split string using a POSIX
regular expression as the delimiter. See
for more information.
regexp_split_to_table('hello
world', E'\\s+')
world(2 rows)
repeat(string
text, number int)
Repeat string the
specified number of times
repeat('Pg', 4)
replace(string
text, from text,
Replace all occurrences in string of substring from with substring to
replace('abcdefabcdef', 'cd',
abXXefabXXef
reverse(str)
Return reversed string.
reverse('abcde')
Return last n
characters in the string. When n is negative, return all but first
|n| characters.
right('abcde', 2)
rpad(string
text, length int
Fill up the string to
length length by appending the
characters fill (a space by
default). If the string is
already longer than length
then it is truncated.
rpad('hi', 5, 'xy')
rtrim(string
characters text])
Remove the longest string containing only characters
from characters (a space by
default) from the end of string
rtrim('testxxzx',
split_part(string
text, delimiter text,
field int)
Split string on delimiter and return the given field
(counting from one)
split_part('abc~@~def~@~ghi',
strpos(string,
substring)
Location of specified substring (same as position(substring
in string), but note the
reversed argument order)
strpos('high', 'ig')
substr(string,
Extract substring (same as substring(string
from from for count))
substr('alphabet', 3,
to_ascii(string
encoding text])
Convert string to
ASCII from another
encoding (only supports conversion from LATIN1, LATIN2,
LATIN9, and WIN1250 encodings)
to_ascii('Karel')
to_hex(number
int or bigint)
Convert number to its
equivalent hexadecimal representation
translate(string
text, from text,
Any character in string
that matches a character in the from set is replaced by the
corresponding character in the to set. If from is longer than to, occurrences of the extra characters
in from are removed.
translate('12345', '143',
See also the aggregate function string_agg in .
Table 9-8. Built-in Conversions
Conversion Name
Source Encoding
Destination Encoding
ascii_to_mic
MULE_INTERNAL
ascii_to_utf8
big5_to_euc_tw
big5_to_mic
MULE_INTERNAL
big5_to_utf8
euc_cn_to_mic
MULE_INTERNAL
euc_cn_to_utf8
euc_jp_to_mic
MULE_INTERNAL
euc_jp_to_sjis
euc_jp_to_utf8
euc_kr_to_mic
MULE_INTERNAL
euc_kr_to_utf8
euc_tw_to_big5
euc_tw_to_mic
MULE_INTERNAL
euc_tw_to_utf8
gb18030_to_utf8
gbk_to_utf8
iso_8859_10_to_utf8
iso_8859_13_to_utf8
iso_8859_14_to_utf8
iso_8859_15_to_utf8
iso_8859_16_to_utf8
iso_8859_1_to_mic
MULE_INTERNAL
iso_8859_1_to_utf8
iso_8859_2_to_mic
MULE_INTERNAL
iso_8859_2_to_utf8
iso_8859_2_to_windows_1250
iso_8859_3_to_mic
MULE_INTERNAL
iso_8859_3_to_utf8
iso_8859_4_to_mic
MULE_INTERNAL
iso_8859_4_to_utf8
iso_8859_5_to_koi8_r
ISO_8859_5
iso_8859_5_to_mic
ISO_8859_5
MULE_INTERNAL
iso_8859_5_to_utf8
ISO_8859_5
iso_8859_5_to_windows_1251
ISO_8859_5
iso_8859_5_to_windows_866
ISO_8859_5
iso_8859_6_to_utf8
ISO_8859_6
iso_8859_7_to_utf8
ISO_8859_7
iso_8859_8_to_utf8
ISO_8859_8
iso_8859_9_to_utf8
johab_to_utf8
koi8_r_to_iso_8859_5
ISO_8859_5
koi8_r_to_mic
MULE_INTERNAL
koi8_r_to_utf8
koi8_r_to_windows_1251
koi8_r_to_windows_866
koi8_u_to_utf8
mic_to_ascii
MULE_INTERNAL
mic_to_big5
MULE_INTERNAL
mic_to_euc_cn
MULE_INTERNAL
mic_to_euc_jp
MULE_INTERNAL
mic_to_euc_kr
MULE_INTERNAL
mic_to_euc_tw
MULE_INTERNAL
mic_to_iso_8859_1
MULE_INTERNAL
mic_to_iso_8859_2
MULE_INTERNAL
mic_to_iso_8859_3
MULE_INTERNAL
mic_to_iso_8859_4
MULE_INTERNAL
mic_to_iso_8859_5
MULE_INTERNAL
ISO_8859_5
mic_to_koi8_r
MULE_INTERNAL
mic_to_sjis
MULE_INTERNAL
mic_to_windows_1250
MULE_INTERNAL
mic_to_windows_1251
MULE_INTERNAL
mic_to_windows_866
MULE_INTERNAL
sjis_to_euc_jp
sjis_to_mic
MULE_INTERNAL
sjis_to_utf8
tcvn_to_utf8
uhc_to_utf8
utf8_to_ascii
utf8_to_big5
utf8_to_euc_cn
utf8_to_euc_jp
utf8_to_euc_kr
utf8_to_euc_tw
utf8_to_gb18030
utf8_to_gbk
utf8_to_iso_8859_1
utf8_to_iso_8859_10
utf8_to_iso_8859_13
utf8_to_iso_8859_14
utf8_to_iso_8859_15
utf8_to_iso_8859_16
utf8_to_iso_8859_2
utf8_to_iso_8859_3
utf8_to_iso_8859_4
utf8_to_iso_8859_5
ISO_8859_5
utf8_to_iso_8859_6
ISO_8859_6
utf8_to_iso_8859_7
ISO_8859_7
utf8_to_iso_8859_8
ISO_8859_8
utf8_to_iso_8859_9
utf8_to_johab
utf8_to_koi8_r
utf8_to_koi8_u
utf8_to_sjis
utf8_to_tcvn
utf8_to_uhc
utf8_to_windows_1250
utf8_to_windows_1251
utf8_to_windows_1252
utf8_to_windows_1253
utf8_to_windows_1254
utf8_to_windows_1255
utf8_to_windows_1256
utf8_to_windows_1257
utf8_to_windows_866
utf8_to_windows_874
windows_1250_to_iso_8859_2
windows_1250_to_mic
MULE_INTERNAL
windows_1250_to_utf8
windows_1251_to_iso_8859_5
ISO_8859_5
windows_1251_to_koi8_r
windows_1251_to_mic
MULE_INTERNAL
windows_1251_to_utf8
windows_1251_to_windows_866
windows_1252_to_utf8
windows_1256_to_utf8
windows_866_to_iso_8859_5
ISO_8859_5
windows_866_to_koi8_r
windows_866_to_mic
MULE_INTERNAL
windows_866_to_utf8
windows_866_to_windows_1251
windows_874_to_utf8
euc_jis_2004_to_utf8
EUC_JIS_2004
utf8_to_euc_jis_2004
EUC_JIS_2004
shift_jis_2004_to_utf8
SHIFT_JIS_2004
utf8_to_shift_jis_2004
SHIFT_JIS_2004
euc_jis_2004_to_shift_jis_2004
EUC_JIS_2004
SHIFT_JIS_2004
shift_jis_2004_to_euc_jis_2004
SHIFT_JIS_2004
EUC_JIS_2004
conversion names follow a standard naming scheme: The
official name of the source encoding with all
non-alphanumeric characters replaced by underscores,
followed by _to_, followed by the
similarly processed destination encoding name. Therefore,
the names might deviate from the customary encodingJS中把字符转成ASCII值的函数示例代码
字体:[ ] 类型:转载 时间:
这篇文章主要是对JS中把字符转成ASCII值的函数示例代码进行了介绍,需要的朋友可以过来参考下,希望对大家有所帮助
字符转ascii码:用charCodeAt();ascii码转字符:用fromCharCode();
看一个小例子 代码如下:&script&str="A";code = str.charCodeAt(); str2 = String.fromCharCode(code);str3 = String.fromCharCode(0x60+26);
document.write(code+'&br /&');document.write(str2+'&br /&');document.write(str3);&/script&输出:
您可能感兴趣的文章:
大家感兴趣的内容
12345678910
最近更新的内容
常用在线小工具

我要回帖

更多关于 akh va quot shrine 的文章

 

随机推荐