当前位置 博文首页 > qq262593421的博客:windows离线安装python3爬虫环境

    qq262593421的博客:windows离线安装python3爬虫环境

    作者:[db:作者] 时间:2021-08-17 21:50

    目录

    一、离线安装python3.6.8

    二、依赖离线模块下载

    三、爬虫离线模块安装

    四、浏览器驱动下载安装

    五、验证版本和依赖


    一、离线安装python3.6.8

    python版本下载地址1:https://www.python.org/downloads/

    python版本下载地址2:https://www.python.org/ftp/python/3.6.8/

    windows安装版:python-3.6.8-amd64.exe

    windows绿色版:python-3.6.8-embed-amd64.zip

    windows编译版:Python-3.6.8.tgz

    二、依赖离线模块下载

    python3.6依赖模块搜索地址:https://pypi.org/search/?c=Programming+Language+%3A%3A+Python+%3A%3A+3.6

    python扩展包镜像网:https://www.lfd.uci.edu/~gohlke/pythonlibs/

    selenium 中文文档:https://python-selenium-zh.readthedocs.io/zh_CN/latest/

    python爬虫依赖模块地址
    功能模块官方地址安装包链接
    pip依赖setuptoolshttps://pypi.org/project/setuptools/setuptools-51.0.0-py3-none-any.whl?
    模块安装工具piphttps://pypi.org/project/pip/pip-20.3.3-py2.py3-none-any.whl?
    requests依赖库certifihttps://pypi.org/project/certifi/certifi-2020.12.5-py2.py3-none-any.whl?
    requests依赖库chardethttps://pypi.org/project/chardet/chardet-4.0.0-py2.py3-none-any.whl?
    requests依赖库idnahttps://pypi.org/project/idna/idna-2.10-py2.py3-none-any.whl?
    requests依赖库urllib3https://pypi.org/project/urllib3/urllib3-1.26.2-py2.py3-none-any.whl?
    http库requestshttps://pypi.org/project/requests/requests-2.25.1-py2.py3-none-any.whl?
    xml解析库lxmlhttps://pypi.org/project/lxml/lxml-4.6.2-cp36-cp36m-win_amd64.whl?
    浏览器自动化框架seleniumhttps://pypi.org/project/selenium/selenium-3.141.0-py2.py3-none-any.whl?
    文字识别库pytesseracthttps://pypi.org/project/pytesseract/pytesseract-0.3.7.tar.gz?
    tesserocr依赖库tesseracthttps://pypi.org/project/tesseract/tesseract-0.1.3.tar.gz?
    图像识别库tesserocrhttps://pypi.org/project/tesserocr/? ? ?
    https://github.com/simonflueckiger/tesserocr-windows_build/releases

    tesserocr-2.5.1.tar.gz

    tesserocr-2.4.0-cp36-cp36m-win_amd64.whl

    文字识别tesseract-ocrhttps://digi.bib.uni-mannheim.de/tesseract/

    tesseract-ocr-w64-setup-v4.0.0.20181030.exe

    矩阵数组计算库numpyhttps://pypi.org/project/numpy/numpy-1.19.4-cp36-cp36m-win_amd64.whl?
    计算机视觉库opencv-pythonhttps://pypi.org/project/opencv-python/opencv_python-4.4.0.46-cp36-cp36m-win_amd64.whl?

    三、爬虫离线模块安装

    1、whl依赖包离线安装?

    python -m pip install --upgrade setuptools-51.0.0-py3-none-any.whl 
    python -m pip install --upgrade pip-20.3.3-py2.py3-none-any.whl 
    python -m pip install certifi-2020.12.5-py2.py3-none-any.whl 
    python -m pip install chardet-4.0.0-py2.py3-none-any.whl 
    python -m pip install idna-2.10-py2.py3-none-any.whl 
    python -m pip install urllib3-1.26.2-py2.py3-none-any.whl?
    python -m pip install requests-2.25.1-py2.py3-none-any.whl 
    python -m pip install lxml-4.6.2-cp36-cp36m-win_amd64.whl 
    python -m pip install selenium-3.141.0-py2.py3-none-any.whl 
    python -m pip install tesserocr-2.4.0-cp36-cp36m-win_amd64.whl
    python -m pip install numpy-1.19.4-cp36-cp36m-win_amd64.whl
    python -m pip install opencv_python-4.4.0.46-cp36-cp36m-win_amd64.whl

    2、tar.gz依赖包离线安装
    解压之后 cd 进入目录执行?

    python setup.py install

    3、tesseract-ocr安装

    Python tesserocr的安装教程:https://jingyan.baidu.com/article/6b18230972e3e6fb59e15909.html

    (1)安装时选择多语言数据下载

    (2)将?Tesseract-OCR?添加到环境变量

    (3)安装成功之后需要将 Tesseract-OCR 根目录下的 tessdata?文件夹复制到 Python?根目录下,否则会出现报错

    RuntimeError: Failed to init API, possibly an invalid tessdata path: D:\Python\Python36\Python368\/tessdata/

    (4)指定变量?tesseract_cmd?为?安装的?tesseract.exe?文件

    from PIL import Image
    import pytesseract
    ## 引用 pytesseract 模块需要手动修改 tesseract_cmd 变量的依赖地址
    pytesseract.pytesseract.tesseract_cmd = r'D:\Python\install-depend\Tesseract-OCR\tesseract.exe'

    四、浏览器驱动下载安装

    selenium webdriver download
    模拟浏览器查看版本镜像地址驱动下载?
    谷歌浏览器chrome://version/http://chromedriver.storage.googleapis.com/index.html? ??http://npm.taobao.org/mirrors/chromedriverchromedriver_win32.zip
    火狐浏览器about:support

    https://npm.taobao.org/mirrors/geckodriver

    https://github.com/mozilla/geckodriver/releases

    geckodriver-v0.26.0-win64.zip
    微软浏览器edge://version/https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/edgedriver_win64.zip
    opera浏览器https://github.com/operasoftware/operachromiumdriver/releasesoperadriver_win64.zip
    IE浏览器设置 -?关于IEhttp://selenium-release.storage.googleapis.com/index.htmlIEDriverServer_x64_3.9.0.zip
    PhantomJShttps://phantomjs.org/download.html? ?https://bitbucket.org/ariya/phantomjs/downloadsphantomjs-2.1.1-windows.zip

    五、验证版本和依赖

    python -V
    // Python 3.6.8
    python
    import platform
    platform.architecture()
    // ('64bit', 'WindowsPE')
    import requests
    import lxml
    import seleninm
    from PIL import Image
    import pytesseract
    import tesserocr
    import cv2 as cv

    cs