python2.7 - python 中文寫入文件后亂碼
問題描述
一個很簡單的小爬蟲程序
for i in L:content = urllib2.urlopen(’http://X.X.X.X/cgi-bin/GetDomainOwnerInfo?domain=%s’ %i)html = content.read()with open(’domain_test.xml’,’a’) as f: f.write(html) print html
print 的結果是中文:
<domaininfo strDomain='XXX.com.' strOwner='XXX' strDepartment='云平臺部' strBusiness='[互聯網業務系統 - XXX' strUser='XXX;'>
但直接打開xml文本的時候卻是亂碼:
<domaininfo strDomain='XXX.com.' strOwner='XXX' strDepartment='?o‘?13??°é?¨' strBusiness='[?o’è?”??‘???????3???? - ?????‰?–1?o”?”¨]' StrUser='XXX;'>
Windows 7 操作系統,python 2.7
請問一下各位,這個問題如何解決?
問題解答
回答1:你需要知道 content 的編碼方式,并考慮是否要轉換
你需要用 utf-8 打開文件,然后寫入
codecs.open(filename, mode[, encoding[, errors[, buffering]]])
Open an encoded file using the given mode and return a wrapped versionproviding transparent encoding/decoding. The default file mode is ’r’meaning to open the file in read mode.
Note The wrapped version will only accept the object format defined bythe codecs, i.e. Unicode objects for most built-in codecs. Output isalso codec-dependent and will usually be Unicode as well. Note Filesare always opened in binary mode, even if no binary mode was specified. This is done to avoid data loss due to encodings using8-bit values. This means that no automatic conversion of ’n’ is doneon reading and writing. encoding specifies the encoding which is to beused for the file.errors may be given to define the error handling. It defaults to’strict’ which causes a ValueError to be raised in case an encodingerror occurs.buffering has the same meaning as for the built-in open() function. Itdefaults to line buffered.
import codecsf = codecs.open('domain_test.xml', 'w', 'utf-8')回答2:
試試在文件開頭加上 # -*- coding: utf-8 -*-
回答3:在文件開頭加上 #coding:utf-8
相關文章:
1. html - css氣泡,實現“倒三角(不知道算不算三角了)”可透明的。2. 主題切換問題,用過別人的webapp在后臺切換模板主題后手機端打開網頁就是切換到的主題了3. datetime - Python如何獲取當前時間4. javascript - node中為中間層如何解決跨域問題5. javascript 的console.log 問題6. HTML5禁止img預覽該怎么解決?7. python - matplotlib安裝之后使用出錯8. javascript - 火狐不支持input date怎么處理?9. MySQL中無法修改字段名的疑問10. 請教各位大佬,瀏覽器點 提交實例為什么沒有反應
