Python抓取3D打印笔天猫评论(1)

2022-08-08 478 0

天猫和唯品会类似,商品信息都是通过json传递

遇到一些难点,分步骤梳理

第一步还是找到天猫的评论js

在点开评论之后,可以看到有个list_detail文件是我们要找的js文件。里面有我们所需要的评论

接下来就是常规的导入requests包、BeautifulSoup包、json包

天猫一样有反爬虫机制,我们需要先登陆天猫帐号。

并加入headers模拟正常浏览器访问

最后找到我们需要的json开头和结尾

这里易错点是ends,不要把反括号加进去了。

最后由于天猫的评论有可能会有买家不晒图或者多张晒图,所以加入try

最后放上代码

import requests
from bs4 import BeautifulSoup
import json
from urllib.parse import quote

# for i in range(1,10,1)
url="https://rate.tmall.com/list_detail_rate.htm?itemId=623333281518&spuId=0&sellerId=898146183&order=3&currentPage=1&append=0&content=1&tagId=&posi=&picture=&groupId=&ua=098%23E1hvvQvpvo%2BvUvCkvvvvvjiWRLcy6jYVRsqUtjD2PmPWzjtVR25W6j1HPsMOQjDCRvhvChCvvvm%2BvpvZo6DchEGvp59p756ERfvKo9Hl6we%2Bvpvbpyn%2FvR%2BvphBsVtVevpvhphvhH29CvvBvpvvvvvhvCyCUvvvvvU9Cvvamp9QheC36uvhvmhCvCvI%2FnzClKvhv8hCvvvvvvhCvphvwv9vvpcCvpCQmvvChNhCvjvUvvhBZphvwv9vvBHpUvpCWpPp0v8Wpwh%2BFp%2B0xhCDAo2Ec6aZtn0vHVAdyaXTAVAigEfmxdX3l8j6OfwowdeQEVAdyaXTAVAil8bmxdB9aUmx%2FQj7x%2B3%2BuJooQipgCvvpvvPMM39hvChCCvvm%2BvpvEphL1mbWvpR0sdvhvmZCmjBBovhVDUuQCvvDvplyWw9CmJFmvvpvZzCQbcDFNznQwp6ifqQLwhYQH7e9%3D&needFold=0&_ksTS=1659945119082_1840&callback=jsonp1841"

headers={
'cookie': 'lid=%E7%AC%94%E6%9D%86%E5%AD%90%E5%8A%9E%E5%85%AC%E6%97%97%E8%88%B0%E5%BA%97; hng=CN%7Czh-CN%7CCNY%7C156; t=4e7df19d381f55aa03e6aaf5211402b7; tracknick=%5Cu7B14%5Cu6746%5Cu5B50%5Cu529E%5Cu516C%5Cu65D7%5Cu8230%5Cu5E97; enc=L1%2BEWKfqEhWH1WILeWEF1KOiuDf2Cajd%2F0eZYzQgcI3e%2FsTc5rVan3hyj4mSQDEslXHbyj4chZunVGKjZ4fTTheXwGRUVwKZANtPTzFrMBg%3D; _tb_token_=56e6b0f765876; cookie2=12b5e4d3fd02a92811d9e7c5eb08f627; cna=Vuo7F5s2vysCAXFoyUirZNF7; xlly_s=1; dnk=%5Cu7B14%5Cu6746%5Cu5B50%5Cu529E%5Cu516C%5Cu65D7%5Cu8230%5Cu5E97; uc1=cookie15=URm48syIIVrSKA%3D%3D&existShop=true&cookie21=Vq8l%2BKCLivbS%2FaO0oky%2BWg%3D%3D&cookie14=UoexOzXp4uI3JQ%3D%3D&tmb=1&cookie16=W5iHLLyFPlMGbLDwA%2BdvAGZqLg%3D%3D&pas=0; uc3=vt3=F8dCv4G3YJZwZvY%2BTHY%3D&lg2=Vq8l%2BKCLz3%2F65A%3D%3D&id2=UUphwocR7BRT9edm5Q%3D%3D&nk2=0o8%2FnXBGkvTgGSdoApFcWQ%3D%3D; _l_g_=Ug%3D%3D; uc4=id4=0%40U2grGR8RjYrYeyzJ7CYb6fOOHfHliV0x&nk4=0%400D4kcvaqbgtfR8uEG8IjbABnAOosVj810Oaq; unb=2208092032956; lgc=%5Cu7B14%5Cu6746%5Cu5B50%5Cu529E%5Cu516C%5Cu65D7%5Cu8230%5Cu5E97; cookie1=BYlty4Hl0E049Ew0r6wFoRPzJm4uOVqjRMkbuC2MJuQ%3D; login=true; cookie17=UUphwocR7BRT9edm5Q%3D%3D; _nk_=%5Cu7B14%5Cu6746%5Cu5B50%5Cu529E%5Cu516C%5Cu65D7%5Cu8230%5Cu5E97; sgcookie=E100ok6QbYGbzieEtoAVpPspkXLTvBc%2FAKBqfUOYbF0M%2BSmex5gzXNGMA1YUkEaVKZAmZ9kBwhbi4ne2O9eh5iuLZ9Us7hCR5kQtxe%2F0rv06A%2BM%3D; cancelledSubSites=empty; sg=%E5%BA%9764; csg=fd892b3d; tfstk=cE6PBONl18ezEufNgK9UAJTCF2kRZPSlxx-6ZP9GaPHnjTAliK0pmYWk0Hiq-Qf..; l=eBgzLE7VLtIL9_j6BOfwlurza77tkIRAguPzaNbMiOCPOv59fELPW6Yxl2YpCnGVh67HR3uy_uqHBeYBqIq0x6aNa6Fy_iHmn; isg=BBoauRo3LOWBjaBJI5ecahr8a8A8S54lWzNGhSSTvq1Yl7rRDNjLNSvhZ2MLRxa9',
'referer': 'https://detail.tmall.com/item.htm?spm=a230r.1.14.73.6ef667c02zYiq1&id=623333281518&ns=1&abbucket=7&skuId=5049447779544',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}
html=requests.get(url,headers=headers).text
start=html.find('{"rateDetail"')
ends=html.find('}}')+len('}}')
for p in json.loads(html[start:ends])['rateDetail']['rateList']:
    print(p['rateContent'])
    try:
        print('http:'+p['pics'][0])
        print('http:' + p['pics'][1])
        print('http:' + p['pics'][2])
        print('http:' + p['pics'][3])
        print('http:' + p['pics'][4])
        print('http:' + p['pics'][5])
    except:
        pass

结果可以正常导出这一页的评论和买家的晒图(最多5张)。之后只要分类进行保存即可

通过这种方式,可以快速把同行的好评晒图快速的复制保存。不用手动一张一张保存。效率大幅提升。

接下来再更新如何把评论翻页处理完,实现一次性把所有评论全部爬取。

相关文章

python 免费下载歌曲和破解VIP视频
Python抓取淘宝评论(1)
Python抓取3D打印笔天猫评论(3)
利用python对电脑文件进行分类整理
python抓取唯品会3D打印笔信息
python爬取新浪财经新闻内容

发布评论