原创力文档下载要付费,写一个脚本把图片下载下来之后存成pdf即可(仅支持pdf原文档)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| import requests import re import json import img2pdf import time import shutil import os url_input=input("请输入需要下载的文档链接:") s=requests.get(url=url_input).text title=re.findall('<h2>(.*?)</h2>',s,re.S)[0] view_token = re.findall('view_token: (.*?) //预览的token',s,re.S)[0][1:-1] aid=re.findall('office.*?aid: (.*?), //解密',s,re.S)[0] project_id="1" t=re.findall('senddate: (.*?),',s,re.S)[0][1:-1] page="%d" times=str(round(time.time() * 1000)) get_url="https://openapi.book118.com/getPreview.html?&project_id=1&aid="+aid+"&t="+t+"&view_token="+view_token+"&page="+page+"&_="+times page=re.findall('actual_page: (.*?), //真实页数',s,re.S)[0]
num=1 a = [] while num<=int(page): new_url=format(get_url%num) num = num+6 response=requests.get(url=new_url).text response=response[response.find('{'):-2] time.sleep(10) response = json.loads(response)['data'] for i in response: img_url="https:"+response[i] print(img_url) img_data = requests.get(url=img_url).content img_name="img/"+i+".jpg" with open(img_name, 'wb', ) as fp: fp.write(img_data) a.append(img_name) a=tuple(a) with open(title, "wb") as f: f.write(img2pdf.convert(a)) shutil.rmtree('img') os.mkdir('img')
|
把上面的库都安装好之后,运行即可,输入需要下载的网址,等待即可在当前目录下生成pdf
需要注意的是如果嫌慢或者报错了,只需要修改第28行代码,因为网址有反爬机制,不能连续多次访问,所以设定了暂停10秒一轮,多久越稳,一般设置3秒就差不多了其实