reptile/README.md
2024-08-22 19:24:47 +08:00

17 lines
573 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## 如何运行?
cd到项目根目录
pip install -r requirements.txt
安装所需依赖。
然后先运行scrape.py再运行main_extraction是一个完整的流程。
### 主要脚本
- scrape.py脚本负责抓取处罚信息公开表网址保存至txt文件中
- main_extraction.py负责读取txt文件中的url抓取网页内容,处理失败的url将保存至error_urls.txt中成功的会添加到output_data*.xlsx中
- 标题网址提取.py 负责抓取决定书以及处罚信息公开表保存至excel文件中
联系方式qq 646228430