1
est 2011-02-27 15:51:12 +08:00
UTF8的中文是3字节
|
2
manhere 2011-02-27 16:09:13 +08:00
能不能配合decode encode统一编码后截取?
|
3
darasion 2011-02-27 16:41:27 +08:00
如果是unicode保存的东西,一个中文字符就算一个“字节”
如果是utf-8,那就算3个。 如果是GBxxx,就算2个。 Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> len(u'哈哈') 2 >>> len(u'哈哈'.encode('gbk')) 4 >>> len(u'哈哈'.encode('utf-8')) 6 >>> |