如下,我想获取 a 标签下的文本,aaabbbccc 作为列表一个值,而不是["aaa","bbb","ccc"],该如何处理呢?
from lxml import etree
html_str='''
<span class="til">
<a href="http://www.xxxx.com">
"aaa"
<br>
"bbb"
"ccc"
<br>
</a>
</span>
'''
html = etree.HTML(html_str)
content = html.xpath('//a/text()')
print(content)
"""
output:
['\n "aaa"\n ', '\n "bbb"\n "ccc"\n ', '\n ']
"""
1
ch2 2021-03-24 16:14:32 +08:00
改用 BeautifulSoup,取 node.text
|
2
QuinceyWu 2021-03-24 16:28:28 +08:00
price = [x.strip() for x in content if x.strip() != '']
str1 = price[1].replace(" ", "").replace("\n", '').replace('"', "") str2 = price[0].replace('"', '') print(str2+str1) |
3
meiyoumingzi6 2021-03-24 16:32:24 +08:00
列表都拿到了, 拼起来不就好了?
|
4
mekingname 2021-03-24 16:35:27 +08:00
content = ''.join(x.strip() for x in html.xpath('//a/text()'))
|
5
polarpy 2021-03-24 16:41:29 +08:00
拿出来的值替换换行跟空格
|
6
mrleohe 2021-03-24 16:48:05 +08:00
''.join([i.strip() for i in ''.join(html.xpath('//a/text()')).split('"') ])
|
7
CLCLCLCLCL 2021-03-25 12:04:46 +08:00
html = etree.HTML(html_str)
content = html.xpath('string(//a)') 直接用 string 就行 |
8
2bin OP @CLCLCLCLCL 试了下,貌似只能提取第一个 a 标签的,有多个 a 后面不知道怎么提取出来
|
9
zyb201314 2021-03-26 00:31:45 +08:00 via Android
#这样?
html = etree.HTML(html_str) lst=[] for a in html.xpath('//span//a'): content = a.xpath('.//text()') l=''.join("".join(content).split()).replace('"',"") lst.append(l) print(lst) |
10
CLCLCLCLCL 2021-03-26 11:07:34 +08:00
@2bin 是的, 循环一下 a 标签就行, 看你想用哪个了
|
11
dongxiao 2021-03-26 15:36:17 +08:00
html.xpath("string(//a)")
|
12
2bin OP |