1
JohnDHH 2017-01-06 10:05:14 +08:00
In [1]: from bs4 import BeautifulSoup
In [2]: soup = BeautifulSoup('''<p data-page-model="text">a</p>\n <p data-page="text">b</p>''', "html.parser") In [3]: soup.find_all("p", attrs={'data-page-model':'text'}) Out[3]: [<p data-page-model="text">a</p>] In [4]: soup.find_all("p", attrs={'data-page':'text'}) Out[4]: [<p data-page="text">b</p>] |
2
mymusise 2017-01-06 22:16:04 +08:00
提前把 html document replace 一下?
html.replace('</p>, <p data-page-model="text">','') |