当前位置 博文首页 > The Bog of Acamy:豆瓣电视剧评论的爬取以及进行情感分析+生成

    The Bog of Acamy:豆瓣电视剧评论的爬取以及进行情感分析+生成

    作者:[db:作者] 时间:2021-09-11 16:52

    很多时候我们要了解一部电视剧或电影的好坏时都会去豆瓣上查看评分和评论,本文基于豆瓣上对某一部电视剧评论的爬取,然后进行SnowNLP情感分析,最后生成词云,给人一个直观的印象

    1. 爬取评论

    以前段时间比较火热的《猎场》为例,因豆瓣网有反爬虫机制,所以在爬取时要带登录后的cookie文件,保存在cookie.txt文件里,具体代码及结果如下:

    import requests, codecs
    from lxml import html
    import time
    import random
    
    header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0'}
    f_cookies = open('cookie.txt', 'r')
    cookies = {}
    for line in f_cookies.read().split(';'):
        name, value = line.strip().split('=', 1)
        cookies[name] = value
    #print(cookies)
    
    for num in range(0, 500, 20):
        url = 'https://movie.douban.com/subject/26322642/comments?start=' + str(
            num) + '&limit=20&sort=new_score&status=P&percent_type='
        with codecs.open('comment.txt', 'a', encoding='utf-8') as f:
            try:
                r = requests.get(url, headers=header, cookies=cookies)
                result = html.fromstring(r.text)
                comment = result.xpath( " // div[@class ='comment'] / p / text() ")
                for i in comment:
                    f.write(i.strip() + '\r\n')
            except Exception as e:
                print(e)
        time.sleep(1 + float(random.randint(1, 100)) / 20)
    
    5990755-a02354fb3db41688.png 评论爬取结果cs