Python urllib 库用于操作网页 URL,并对网页的内容进行抓取处理。
1.urllib

2.GET
实例:
对本网站的一个URLhttps://www.simoniu.com/commons/items/catalog/pager/%E6%9C%8D%E9%A5%B0/1进行抓取,并返回响应。
# -*- coding: utf-8 -*-
# @Time : 2022/4/27 11:40
# @File : urllibdemo.py
# @Software : PyCharm
from urllib import request
with request.urlopen('https://www.simoniu.com/commons/items/catalog/pager/%E6%9C%8D%E9%A5%B0/1') as f:
data = f.read()
print('Status:', f.status, f.reason)
for k, v in f.getheaders():
print('%s: %s' % (k, v))
print('Data:', data.decode('utf-8'))
运行结果:
Status: 200
Server: nginx/1.17.6
Date: Wed, 27 Apr 2022 03:43:14 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: close
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Data: {"code":200,"msg":"查询商品列表成功!","data":[{"version":0,"createTime":"2019-10-12 11:49:21","modifyTime":null,"flag":true,"id":11,"name":"2017夏季新款韩版女装雪纺无袖白色连衣裙夏修身显瘦打底a字裙子","pic":"https://img.simoniu.com/good_9.jpg","price":0.07,"number":1000,"catalog":"服饰","buyNum":0,"city":"宁波","property":"雪纺","province":"浙江","shopId":100,"shopName":"韩都衣舍旗舰店","discount":1,"buyCount":1212,"status":1,"webSiteId":2,"freePost":false},{"version":0,"createTime":"2019-10-12 11:50:10","modifyTime":null,"flag":true,"id":12,"name":"夏季2017新款小清新露肩雪纺连衣裙夏女装韩版显瘦气质a字裙子","pic":"https://img.simoniu.com/good_10.jpg","price":0.02,"number":1000,"catalog":"服饰","buyNum":0,"city":"广州","property":"雪纺","province":"广东","shopId":100,"shopName":"韩都衣舍旗舰店","discount":1,"buyCount":8736,"status":1,"webSiteId":2,"freePost":false},{"version":0,"createTime":"2019-10-12 12:27:17","modifyTime":null,"flag":true,"id":13,"name":"2017新款女装夏季雪纺连衣裙韩版收腰显瘦气质荷叶边系带印花中裙","pic":"https://img.simoniu.com/good_11.jpg","price":0.05,"number":1000,"catalog":"服饰","buyNum":0,"city":"杭州","property":"雪纺","province":"浙江","shopId":100,"shopName":"韩都衣舍旗舰店","discount":1,"buyCount":3998,"status":1,"webSiteId":2,"freePost":false},{"version":0,"createTime":"2019-10-12 12:48:18","modifyTime":null,"flag":true,"id":14,"name":"水墨青华2017夏装新款女装气质时尚通勤短袖修身中长款印花连衣裙","pic":"https://img.simoniu.com/good_1.jpg","price":0.1,"number":1000,"catalog":"服饰","buyNum":0,"city":"深圳","property":"雪纺","province":"广东","shopId":100,"shopName":"韩都衣舍旗舰店","discount":1,"buyCount":262,"status":1,"webSiteId":2,"freePost":false},{"version":0,"createTime":"2019-10-12 12:49:40","modifyTime":null,"flag":true,"id":15,"name":"歌兔连衣裙雪纺夏季长裙修身2017新款女装显瘦碎花小清新裙子夏女","pic":"https://img.simoniu.com/good_12.jpg","price":0.04,"number":1000,"catalog":"服饰","buyNum":0,"city":"深圳","property":"雪纺","province":"广东","shopId":100,"shopName":"韩都衣舍旗舰店","discount":1,"buyCount":5921,"status":1,"webSiteId":2,"freePost":false},{"version":0,"createTime":"2019-10-12 12:50:45","modifyTime":null,"flag":true,"id":16,"name":"长裙女夏季2017新款女装韩版大码收腰显瘦气质印花无袖雪纺连衣裙","pic":"https://img.simoniu.com/good_13.jpg","price":0.09,"number":1000,"catalog":"服饰","buyNum":0,"city":"东莞","property":"雪纺","province":"广东","shopId":100,"shopName":"韩都衣舍旗舰店","discount":1,"buyCount":2432,"status":1,"webSiteId":2,"freePost":false},{"version":0,"createTime":"2019-10-12 12:51:30","modifyTime":null,"flag":true,"id":17,"name":"新款莫代尔长裙夏季短袖大码女装宽松显瘦大摆连衣裙沙滩度假裙","pic":"https://img.simoniu.com/good_14.jpg","price":0.08,"number":1000,"catalog":"服饰","buyNum":0,"city":"东莞","property":"雪纺","province":"广东","shopId":100,"shopName":"韩都衣舍旗舰店","discount":1,"buyCount":1333,"status":1,"webSiteId":2,"freePost":false},{"version":0,"createTime":"2019-10-12 12:52:15","modifyTime":null,"flag":true,"id":18,"name":"棉麻连衣裙女夏中长款短袖休闲夏天亚麻女装长裙修身显瘦夏季裙子","pic":"https://img.simoniu.com/good_15.jpg","price":0.01,"number":1000,"catalog":"服饰","buyNum":0,"city":"东莞","property":"雪纺","province":"广东","shopId":100,"shopName":"韩都衣舍旗舰店","discount":1,"buyCount":877,"status":1,"webSiteId":2,"freePost":false},{"version":0,"createTime":"2019-10-12 12:53:46","modifyTime":null,"flag":true,"id":19,"name":"改良旗袍连衣裙文艺范民族风刺绣2017夏季棉麻女装显瘦短袖中长款","pic":"https://img.simoniu.com/good_16.jpg","price":0.1,"number":1000,"catalog":"服饰","buyNum":0,"city":"青岛","property":"雪纺","province":"山东","shopId":100,"shopName":"韩都衣舍旗舰店","discount":1,"buyCount":2143,"status":1,"webSiteId":2,"freePost":false},{"version":0,"createTime":"2019-10-12 12:54:29","modifyTime":null,"flag":true,"id":20,"name":"颜域品牌女装2017夏季新款显瘦娃娃领蝴蝶结系带短袖蕾丝连衣裙","pic":"https://img.simoniu.com/good_17.jpg","price":0.11,"number":1000,"catalog":"服饰","buyNum":0,"city":"青岛","property":"雪纺","province":"山东","shopId":100,"shopName":"韩都衣舍旗舰店","discount":1,"buyCount":5433,"status":1,"webSiteId":2,"freePost":false}]}
3.POST
如果要以POST发送一个请求,只需要把参数data以bytes形式传入。
通过发送POST来模拟发表一个新帖子。
实例:
import json
from urllib import request, parse
article_data = {
"uid": 100,
"title": '测试文章',
"body": "<h1>hello,world!</h1>"
}
# JSON对象转为字符串
article_data = json.dumps(article_data);
req = request.Request('http://jsonplaceholder.typicode.com/posts')
# 模拟iPhone 6去发送请求
req.add_header('User-Agent',
'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
with request.urlopen(req, data=article_data.encode('utf-8')) as f:
print('Status:', f.status, f.reason)
for k, v in f.getheaders():
print('%s: %s' % (k, v))
print('Data:', f.read().decode('utf-8'))
运行结果:
Status: 201 Created
Date: Wed, 27 Apr 2022 03:58:36 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 121
Connection: close
X-Powered-By: Express
X-Ratelimit-Limit: 1000
X-Ratelimit-Remaining: 999
X-Ratelimit-Reset: 1651031944
Vary: Origin, X-HTTP-Method-Override, Accept-Encoding
Access-Control-Allow-Credentials: true
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
Access-Control-Expose-Headers: Location
Location: http://jsonplaceholder.typicode.com/posts/101
X-Content-Type-Options: nosniff
Etag: W/"79-8iUJHoOPsa52UxApirDwM/2nWUA"
Via: 1.1 vegur
CF-Cache-Status: DYNAMIC
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=LNkv%2B8BZMj%2FNrhlbCj4ZSXVM7YeHhQCilXlLRz%2F%2B%2B1V0l8WcnNd1l3Q39XbDG9WgnNasKb%2BQgrceoUdtsEHb83XzHyUfUnndXwGJ6IByjbvyMrmOabgsMkf9Jz1QNKXYl7s8cmR3D%2FdZqRfjCuMjY2%2BFvSpquWX4A4tO"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Server: cloudflare
CF-RAY: 7024a4047be90d10-LAX
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400
Data: {
"{\"uid\": 100, \"title\": \"\\u6d4b\\u8bd5\\u6587\\u7ae0\", \"body\": \"<h1>hello,world!</h1>\"}": "",
"id": 101
}
小结: urllib提供的功能就是利用程序去执行各种HTTP请求。如果要模拟浏览器完成特定功能,需要把请求伪装成浏览器。伪装的方法是先监控浏览器发出的请求,再根据浏览器的请求头来伪装,User-Agent头就是用来标识浏览器的。