当前位置 博文首页 > The Bog of Acamy:豆瓣电视剧评论的爬取以及进行情感分析+生成

    The Bog of Acamy:豆瓣电视剧评论的爬取以及进行情感分析+生成

    作者:[db:作者] 时间:2021-09-11 16:52


    1. 爬取评论


    import requests, codecs
    from lxml import html
    import time
    import random
    header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0'}
    f_cookies = open('cookie.txt', 'r')
    cookies = {}
    for line in f_cookies.read().split(';'):
        name, value = line.strip().split('=', 1)
        cookies[name] = value
    for num in range(0, 500, 20):
        url = 'https://movie.douban.com/subject/26322642/comments?start=' + str(
            num) + '&limit=20&sort=new_score&status=P&percent_type='
        with codecs.open('comment.txt', 'a', encoding='utf-8') as f:
                r = requests.get(url, headers=header, cookies=cookies)
                result = html.fromstring(r.text)
                comment = result.xpath( " // div[@class ='comment'] / p / text() ")
                for i in comment:
                    f.write(i.strip() + '\r\n')
            except Exception as e:
        time.sleep(1 + float(random.randint(1, 100)) / 20)
    5990755-a02354fb3db41688.png 评论爬取结果cs