问题:爬取所获取到的数据是 json 格式的,但是每次爬取都会有几个链接获取到的 json 是不完整的,只获取到了一半左右,代码就会报一个这样的错(Unterminated string starting at: line 1 column 13042 (char 13041)).我就尝试把这报错的几个链接单独抓取,就不会出现这种不完整数据的情况.好奇怪. 发这部分链接的代码:
yield scrapy.Request(
url='http://www.lzxxxx.com/wspsp/wxs/find_FjztByClh?msg=%s&%s' % (int(round(time.time() * 1000)), href),#拼接的 url
callback=self.roomstate_item,
headers={'Referer': 'http://www.lzxxxx.com/wspsp/web/swb/list-xmxx.jsp'}#需要这个 referer 否则直接异常访问
)
def roomstate_item(self, response):
try:
j = json.loads(response.text)
r_list = j.get('rows')
for r in r_list:
roomstateitem = RoomStateItem()
roomstateitem['F1'] = r.get('zts')
roomstateitem['roomid'] = r.get('id')
yield roomstateitem
except Exception as e:
self.logger.info(e)
self.logger.info(response.url)
self.logger.info(response.text)
with open('bbb.txt','w') as f:
f.write(response.text)
这是拿到的不完整数据有头没尾的(太多了删了一部分中间的数据,都是这格式):
{"rows":[{"zts":"不可售","id":"B9E8973CD11F4F93B61619FA4221451C","fwbm":"2501201703203002-101"},{"zts":"可售","id":"34D2AC4934BA49C6A22922EC255F521D","fwbm":"2501201703203002101"},{"zts":"已售","id":"911BAA3C7FFF4FC0B751E0A3A10EF266","fwbm":"2501201703203002102"},{"zts":"已售","id":"0F9DC02CC0DB419188525A844D8D0AE0","fwbm":"2501201703203002705"},{"zts":"已售","id":"B06CF
希望给点可用的建议
1
qianyin123 2019-07-08 11:27:24 +08:00
多粘点报错代码
|
2
studyaa OP @qianyin123 json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 13042 (char 13041)就这个错啊 json 数据不完整导致的
|