Python UnicodeDecodeError

Code

1
2
3
4
5
import sys,os 
import re
from bs4 import BeautifulSoup
with open(File_Path+'\\business_card.html') as fp:
soup = BeautifulSoup(fp,'html.parser')

Error msg

1
2
3
4
5
6
Traceback (most recent call last):
File "D:\work temp\business card\re_name.py", line 35, in <module>
soup = BeautifulSoup(fp,'html.parser')
File "C:\Users\goodn\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bs4\__init__.py", line 309, in __init__
markup = markup.read()
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe5 in position 471: illegal multibyte sequence

Solved
在讀取檔案時,添加指定utf-8 編碼

1
2
3
4
5
import sys,os 
import re
from bs4 import BeautifulSoup
with open(File_Path+'\\business_card.html',encoding='utf-8') as fp:
soup = BeautifulSoup(fp,'html.parser')