当前位置 博文首页 > 无限迭代中......:Python——百度识图-相似图片爬虫下载解决方

    无限迭代中......:Python——百度识图-相似图片爬虫下载解决方

    作者:[db:作者] 时间:2021-07-19 22:28

    解决方案

    #!usr/bin/env python
    # -*- coding:utf-8 _*-
    """
    @version: 0.0.1
    @author: ShenTuZhiGang
    @time: 2021/03/08 19:44
    @file: imagetest.py
    @function:
    @last modified by: ShenTuZhiGang
    @last modified time: 2021/03/08 19:44
    """
    import json
    import os
    import re
    
    import cv2
    import requests
    from urllib.parse import urlparse, parse_qs
    import numpy as np
    data = {
        'image':open(r"C:\Users\Lenovo\Desktop/2.jpg",'rb')
    }
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
    }
    r = requests.post('https://graph.baidu.com/upload?tn=pc&from=pc&image_source=PC_UPLOAD_IMAGE_MOVE&range={%22page_from%22:%20%22shituIndex%22}&extUiData%5bisLogoShow%5d=1',files=data,headers = headers).text
    url = json.loads(r)["data"]["url"]
    o = urlparse(url)
    q = parse_qs(o.query, True)
    sign = q['sign'][0]
    r1 = requests.get(url,headers = headers).text
    r0 = requests.get("https://graph.baidu.com/ajax/pcsimi?sign={}".format(sign)).text
    l = json.loads(r0)["data"]["list"]
    
    for one in l:
        img_url = one["thumbUrl"]
        try:
            response = requests.get(img_url, timeout=1)
        except Exception as e:
            print("ERROR: download img timeout.")
    
        try:
            # imgDataNp = np.fromstring(response.content, dtype='uint8')
            imgDataNp = np.frombuffer(response.content, dtype='uint8')
            img = cv2.imdecode(imgDataNp, cv2.IMREAD_UNCHANGED)
    
            img_name = img_url.split('/')[-1]
            print(img_name)
            save_path = os.path.join(os.getcwd(), img_name)
            cv2.imwrite(save_path, img)
        except Exception as e:
            print(e)
            print("ERROR: download img corruption.")

    参考文章

    爬取百度识图的结果

    目标检测-爬虫-利用百度识图的方法来批量的爬取图片生产数据集

    ?

    cs