2.12 使用turbo作为plus超限时的保底选择
This commit is contained in:
parent
86fe5b97e0
commit
0388beaeaa
72
README.md
72
README.md
@ -84,36 +84,70 @@ bid-assistance/test 里面找个文件的url,推荐'094定稿-湖北工业大
|
||||
|
||||
### 项目中做限制的地方
|
||||
|
||||
**大模型的限制**
|
||||
#### **账号、服务器分流**
|
||||
|
||||
服务器分流:目前linux服务器和windows服务器主要是硬件上的分流(文件切分需要消耗CPU资源),大模型基底还是调用阿里,共用的tpm qpm。
|
||||
|
||||
账号分流:qianwen_plus下的
|
||||
|
||||
```
|
||||
api_keys = cycle([
|
||||
os.getenv("DASHSCOPE_API_KEY"),
|
||||
# os.getenv("DASHSCOPE_API_KEY_BACKUP1"),
|
||||
# os.getenv("DASHSCOPE_API_KEY_BACKUP2")
|
||||
])
|
||||
api_keys_lock = threading.Lock()
|
||||
def get_next_api_key():
|
||||
with api_keys_lock:
|
||||
return next(api_keys)
|
||||
|
||||
api_key = get_next_api_key()
|
||||
```
|
||||
|
||||
只需轮流使用不同的api_key即可。目前没有启用。
|
||||
|
||||
|
||||
|
||||
#### **大模型的限制**
|
||||
|
||||
general/llm下的doubao.py 和通义千问long_plus.py
|
||||
**目前是linux和windows各部署一套,因此项目中的qps是对半的,即calls=?**
|
||||
|
||||
1. 这是qianwen-long的限制(针对阿里qpm为1200,投标生成和解析对半分600,每秒就是10,又linux和windows服务器对半,就是5;)
|
||||
1. 这是qianwen-long的限制(针对阿里qpm为1200,每秒就是20,又linux和windows服务器对半,就是10;TPM无上限)
|
||||
|
||||
```
|
||||
@sleep_and_retry
|
||||
@limits(calls=5, period=1) # 每秒最多调用4次
|
||||
@limits(calls=10, period=1) # 每秒最多调用10次
|
||||
def rate_limiter():
|
||||
pass # 这个函数本身不执行任何操作,只用于限流
|
||||
```
|
||||
|
||||
2. 这是qianwen-plus的限制(针对tpm为1000万,每个请求2万tokens,那么linux和windows总的qps为8时,8x60x2=960<1000。)
|
||||
2. 这是qianwen-plus的限制(针对tpm为1000万,每个请求2万tokens,那么linux和windows总的qps为8时,8x60x2=960<1000。单个为4)
|
||||
**经过2.11号测试,calls=4时最高TPM为800,因此把目前稳定版把calls设为5**
|
||||
|
||||
**2.12,用turbo作为超限后的承载,目前把calls设为7**
|
||||
|
||||
```
|
||||
@sleep_and_retry
|
||||
@limits(calls=4, period=1) # 每秒最多调用4次
|
||||
@limits(calls=7, period=1) # 每秒最多调用7次
|
||||
def qianwen_plus(user_query, need_extra=False):
|
||||
logger = logging.getLogger('model_log') # 通过日志名字获取记录器
|
||||
```
|
||||
|
||||
3. qianwen_turbo的限制(TPM为500万,由于它是plus后的手段,稳妥一点,qps设为6,两个服务器分流即calls=3)
|
||||
|
||||
```
|
||||
@sleep_and_retry
|
||||
@limits(calls=3, period=1) # 500万tpm,每秒最多调用6次,两个服务器分流就是3次 (plus超限后的保底手段,稳妥一点)
|
||||
```
|
||||
|
||||
**重点!!**后续阿里扩容之后成倍修改这块**calls=?**
|
||||
|
||||
如果不用linux和windows负载均衡,这里的calls也要乘2!!
|
||||
|
||||
|
||||
|
||||
**接口的限制**
|
||||
#### **接口的限制**
|
||||
|
||||
1. start_up.py的def create_app()函数,限制了对每个接口同时100次请求。这里事实上不再限制了(因为100已经足够大了),默认限制做到大模型限制这块。
|
||||
|
||||
@ -132,12 +166,9 @@ app.connection_limiters['upload'] = ConnectionLimiter(max_connections=100)
|
||||
def zbparse():
|
||||
```
|
||||
|
||||
|
||||
这里限制了每个接口内部执行的时间,暂时设置到了30分钟!(不包括排队时间)超时就是解析失败
|
||||
|
||||
|
||||
|
||||
**后端的限制:**
|
||||
#### **后端的限制:**
|
||||
|
||||
目前后端发起招标请求,如果发送超过100(max_connections=100)个请求,我这边会排队后面的请求,这时后端的计时器会将这些请求也视作正在解析中,事实上它们还在排队等待中,这样会导致在极端情况下,新进的解析文件速度大于解析的速度,排队越来越长,后面的文件会因为等待时间过长而直接失败,而不是'解析失败'。
|
||||
|
||||
@ -354,4 +385,23 @@ start_up.py是启动脚本,run_serve也是启动脚本,是对start_up.py的
|
||||
|
||||

|
||||
|
||||
这个是解析得来的结果,适合给前端展示,但是要生成商务技术评议偏离表的话,需要再调一次大模型,对该数据进行重新归纳,以字符串列表为佳。再传给后端。(未做)
|
||||
这个是解析得来的结果,适合给前端展示,但是要生成商务技术评议偏离表的话,需要再调一次大模型,对该数据进行重新归纳,以字符串列表为佳。再传给后端。(未做)
|
||||
|
||||
|
||||
|
||||
### 如何定位问题
|
||||
|
||||
1. 查看static下的output文件夹 (upload大解析对应output1)
|
||||
2. docker-compose文件中规定了数据卷挂载的路径:- /home/Z/zbparse_output_dev:/flask_project/flask_app/static/output
|
||||
也就是说static/output映射到了服务器的Z/zbparse_output_dev文件夹
|
||||
3. 根据时间查找哪个子文件夹(uuid作为子文件名)
|
||||
4. 查看是否有final_result.json文件,如果有,说明解析流程正常结束了,问题可能出在后端(a.后端接口请求超限30分钟 b.后处理存在解析数据的时候出错)
|
||||
|
||||
也可能出现在自身解析,可以查看子文件内的log.txt,查看日志。
|
||||
|
||||
5. 若解析正常(有final_result)但解析不准,可以根据以下定位:
|
||||
|
||||
a.查看子文件夹下的文件切分是否准确,例如:如果评标办法不准确,那么查看ztbfile_evaluation_methon,是否正确切到了评分细则。如果切到了,那就改general/商务技术评分提取里的提示词;否则修改截取pdf那块关于'评标办法'的正则表达式。
|
||||
|
||||
b.总之是**先看切的准不准,再看提示词能否优化**,都要定位到对应的代码中!
|
||||
|
||||
|
@ -16,7 +16,8 @@ services:
|
||||
cpus: 4.0 # 限制容器使用2个CPU核心
|
||||
security_opt:
|
||||
- seccomp:unconfined
|
||||
# 可选:定义网络或其他全局配置
|
||||
# networks:
|
||||
# default:
|
||||
# driver: bridge
|
||||
|
||||
# 如果单独服务器的话,注释以下,不做限制:
|
||||
# mem_limit: "12g" # 容器最大可使用内存为8GB
|
||||
# mem_reservation: "4g" # 容器保证可使用内存为4GB
|
||||
# cpus: 4.0 # 限制容器使用2个CPU核心
|
@ -1,99 +1,10 @@
|
||||
import os
|
||||
import time
|
||||
import PyPDF2
|
||||
import requests
|
||||
from ratelimit import sleep_and_retry, limits
|
||||
from flask_app.general.读取文件.clean_pdf import extract_common_header, clean_page_content
|
||||
|
||||
def pdf2txt(file_path):
|
||||
common_header = extract_common_header(file_path)
|
||||
# print(f"公共抬头:{common_header}")
|
||||
# print("--------------------正文开始-------------------")
|
||||
result = ""
|
||||
with open(file_path, 'rb') as file:
|
||||
reader = PyPDF2.PdfReader(file)
|
||||
num_pages = len(reader.pages)
|
||||
# print(f"Total pages: {num_pages}")
|
||||
for page_num in range(num_pages):
|
||||
page = reader.pages[page_num]
|
||||
text = page.extract_text()
|
||||
if text:
|
||||
# print(f"--------第{page_num}页-----------")
|
||||
cleaned_text = clean_page_content(text,common_header)
|
||||
# print(cleaned_text)
|
||||
result += cleaned_text
|
||||
# print(f"Page {page_num + 1} Content:\n{cleaned_text}")
|
||||
else:
|
||||
print(f"Page {page_num + 1} is empty or text could not be extracted.")
|
||||
directory = os.path.dirname(os.path.abspath(file_path))
|
||||
output_path = os.path.join(directory, 'extract.txt')
|
||||
# 将结果保存到 extract.txt 文件中
|
||||
try:
|
||||
with open(output_path, 'w', encoding='utf-8') as output_file:
|
||||
output_file.write(result)
|
||||
print(f"提取内容已保存到: {output_path}")
|
||||
except IOError as e:
|
||||
print(f"写入文件时发生错误: {e}")
|
||||
# 返回保存的文件路径
|
||||
return output_path
|
||||
|
||||
def read_txt_to_string(file_path):
|
||||
"""
|
||||
读取txt文件内容并返回一个包含所有内容的字符串,保持原有格式。
|
||||
|
||||
参数:
|
||||
- file_path (str): txt文件的路径
|
||||
|
||||
返回:
|
||||
- str: 包含文件内容的字符串
|
||||
"""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as file: # 确保使用适当的编码
|
||||
content = file.read() # 使用 read() 保持文件格式
|
||||
return content
|
||||
except FileNotFoundError:
|
||||
return "错误:文件未找到。"
|
||||
except Exception as e:
|
||||
return f"错误:读取文件时发生错误。详细信息:{e}"
|
||||
|
||||
def get_total_tokens(text):
|
||||
"""
|
||||
调用 API 计算给定文本的总 Token 数量。 注:doubao的计算方法!与qianwen不一样
|
||||
返回:
|
||||
- int: 文本的 total_tokens 数量。
|
||||
"""
|
||||
# API 请求 URL
|
||||
url = "https://ark.cn-beijing.volces.com/api/v3/tokenization"
|
||||
|
||||
# 获取 API 密钥
|
||||
doubao_api_key = os.getenv("DOUBAO_API_KEY")
|
||||
if not doubao_api_key:
|
||||
raise ValueError("DOUBAO_API_KEY 环境变量未设置")
|
||||
|
||||
# 请求头
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": "Bearer " + doubao_api_key
|
||||
}
|
||||
model = "ep-20241119121710-425g6"
|
||||
# 请求体
|
||||
payload = {
|
||||
"model": model,
|
||||
"text": [text] # API 文档中要求 text 是一个列表
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.post(url, headers=headers, json=payload)
|
||||
response.raise_for_status()
|
||||
response_data = response.json()
|
||||
total_tokens=response_data["data"][0]["total_tokens"]
|
||||
return total_tokens
|
||||
except Exception as e:
|
||||
print(f"获取 Token 数量失败:{e}")
|
||||
return 0
|
||||
|
||||
from flask_app.general.llm.大模型通用函数 import get_total_tokens
|
||||
@sleep_and_retry
|
||||
@limits(calls=10, period=1) # 每秒最多调用10次
|
||||
@limits(calls=2, period=1) # tpm=300万,每秒最多调用4次,目前两个服务器分流就是2次
|
||||
def doubao_model(full_user_query, need_extra=False):
|
||||
"""
|
||||
对于429错误,一共尝试三次,前两次等待若干时间再发起调用,第三次换模型
|
||||
@ -172,6 +83,7 @@ def doubao_model(full_user_query, need_extra=False):
|
||||
# 如果是 429 错误
|
||||
if status_code == 429:
|
||||
if attempt < max_retries_429:
|
||||
wait_time=1
|
||||
if attempt == 0:
|
||||
wait_time = 3
|
||||
elif attempt == 1:
|
||||
@ -206,22 +118,6 @@ def doubao_model(full_user_query, need_extra=False):
|
||||
else:
|
||||
return None
|
||||
|
||||
def generate_full_user_query(file_path, prompt_template):
|
||||
"""
|
||||
根据文件路径和提示词模板生成完整的user_query。
|
||||
|
||||
参数:
|
||||
- file_path (str): 需要解析的文件路径。
|
||||
- prompt_template (str): 包含{full_text}占位符的提示词模板。
|
||||
|
||||
返回:
|
||||
- str: 完整的user_query。
|
||||
"""
|
||||
# 假设extract_text_by_page已经定义,用于提取文件内容
|
||||
full_text=read_txt_to_string(file_path)
|
||||
# 格式化提示词,将提取的文件内容插入到模板中
|
||||
user_query = prompt_template.format(full_text=full_text)
|
||||
return user_query
|
||||
|
||||
if __name__ == "__main__":
|
||||
txt_path = r"output.txt"
|
||||
|
@ -2,7 +2,8 @@
|
||||
import concurrent.futures
|
||||
import json
|
||||
from flask_app.general.json_utils import clean_json_string
|
||||
from flask_app.general.llm.通义千问long_plus import qianwen_long, qianwen_long_stream, qianwen_plus
|
||||
from flask_app.general.llm.通义千问long import qianwen_long, qianwen_long_stream
|
||||
from flask_app.general.llm.qianwen_plus import qianwen_plus
|
||||
|
||||
def generate_continue_query(original_query, original_answer):
|
||||
"""
|
||||
|
153
flask_app/general/llm/qianwen_plus.py
Normal file
153
flask_app/general/llm/qianwen_plus.py
Normal file
@ -0,0 +1,153 @@
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import threading
|
||||
import time
|
||||
from itertools import cycle
|
||||
|
||||
from openai import OpenAI
|
||||
from ratelimit import sleep_and_retry, limits
|
||||
from flask_app.general.llm.qianwen_turbo import qianwen_turbo
|
||||
from flask_app.general.llm.大模型通用函数 import extract_error_details, get_total_tokens
|
||||
|
||||
#若多账号实现分流,那么就在这里添加不同的API_KEY,这里要求每个模型的TPM都是一样的!!具体API_KEY写在.env文件中
|
||||
api_keys = cycle([
|
||||
os.getenv("DASHSCOPE_API_KEY"),
|
||||
# os.getenv("DASHSCOPE_API_KEY_BACKUP1"),
|
||||
# os.getenv("DASHSCOPE_API_KEY_BACKUP2")
|
||||
])
|
||||
api_keys_lock = threading.Lock()
|
||||
def get_next_api_key():
|
||||
with api_keys_lock:
|
||||
return next(api_keys)
|
||||
@sleep_and_retry
|
||||
@limits(calls=7, period=1) # tpm是1000万,稳定下每秒最多调用10次,两个服务器分流就是5次 2.12日:增加了turbo作为承载超限的部分,可以适当扩大到calls=7
|
||||
def qianwen_plus(user_query, need_extra=False):
|
||||
logger = logging.getLogger('model_log') # 通过日志名字获取记录器
|
||||
# print("call qianwen-plus...")
|
||||
"""
|
||||
使用之前上传的文件,根据用户查询生成响应,并实时显示流式输出。
|
||||
目前命中缓存局限:1.不足 256 Token 的内容不会被缓存。
|
||||
2.上下文缓存的命中概率并不是100%,即使是上下文完全一致的请求,也存在无法命中的概率,命中概率依据系统判断而定。
|
||||
3.若多线程同时请求,缓存无法及时更新!
|
||||
参数:
|
||||
- user_query: 用户查询
|
||||
- need_extra: 是否需要返回额外数据(默认 False)
|
||||
返回:
|
||||
- 当 need_extra=False 时: 返回响应内容 (str)
|
||||
- 当 need_extra=True 时: 返回 (响应内容, token_usage)
|
||||
"""
|
||||
|
||||
# 内部定义重试参数
|
||||
max_retries = 2
|
||||
backoff_factor = 2.0
|
||||
api_key = get_next_api_key()
|
||||
client = OpenAI(
|
||||
api_key=api_key,
|
||||
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
|
||||
)
|
||||
|
||||
for attempt in range(1, max_retries + 2): # +1 是为了包括初始调用
|
||||
try:
|
||||
completion_tokens = 0 # 初始化 completion_tokens 为 0
|
||||
# 生成基于用户查询的响应
|
||||
completion = client.chat.completions.create(
|
||||
model="qwen-plus",
|
||||
temperature=0.5,
|
||||
messages=[
|
||||
{
|
||||
'role': 'user',
|
||||
'content': user_query
|
||||
}
|
||||
],
|
||||
stream=True, # 启用流式响应
|
||||
stream_options={"include_usage": True}
|
||||
)
|
||||
|
||||
full_response = "" # 用于存储完整的响应内容
|
||||
|
||||
for chunk in completion:
|
||||
# 解析 chunk 数据
|
||||
if hasattr(chunk, 'to_dict'):
|
||||
chunk_data = chunk.to_dict()
|
||||
else:
|
||||
chunk_data = json.loads(chunk.model_dump_json())
|
||||
|
||||
# 处理 usage 信息
|
||||
usage = chunk_data.get('usage')
|
||||
if usage is not None:
|
||||
completion_tokens = usage.get('completion_tokens', 0)
|
||||
# prompt_tokens_details = usage.get('prompt_tokens_details', {}) #命中tokens ,取消注释可以print
|
||||
# cache_hit = prompt_tokens_details.get('cached_tokens', 0)
|
||||
# print("命中:"+str(cache_hit))
|
||||
# 处理 choices 信息
|
||||
choices = chunk_data.get('choices', [])
|
||||
if choices:
|
||||
choice = choices[0]
|
||||
delta = choice.get('delta', {})
|
||||
content = delta.get('content', '')
|
||||
if content:
|
||||
full_response += content
|
||||
# 实时打印内容(可以取消注释下面一行以实时输出)
|
||||
# print(content, end='', flush=True)
|
||||
if choice.get('finish_reason'):
|
||||
# 处理完成原因(如果需要)
|
||||
pass # 或者记录 finish_reason 以供后续使用
|
||||
|
||||
if need_extra:
|
||||
return full_response, completion_tokens
|
||||
else:
|
||||
return full_response
|
||||
|
||||
except Exception as exc:
|
||||
# 提取错误代码
|
||||
error_code, error_code_string = extract_error_details(str(exc))
|
||||
logger.error(f"第 {attempt} 次尝试失败,查询:'{user_query}',错误:{exc}", exc_info=True)
|
||||
|
||||
if error_code == 429:
|
||||
if attempt <= max_retries:
|
||||
token_count = get_total_tokens(user_query)
|
||||
if token_count < 90000: #如果超过plus的qpm tpm,且user_query的tokens<90000,那么就调用turbo过渡一下。
|
||||
return qianwen_turbo(user_query,need_extra)
|
||||
sleep_time = backoff_factor * (2 ** (attempt - 1)) # 指数退避
|
||||
logger.warning(f"错误代码为 429,将在 {sleep_time} 秒后重试...")
|
||||
time.sleep(sleep_time)
|
||||
else:
|
||||
logger.error(f"查询 '{user_query}' 的所有 {max_retries + 1} 次尝试均失败(429 错误)。")
|
||||
break
|
||||
else:
|
||||
# 针对非 429 错误,只重试一次
|
||||
if attempt <= 1:
|
||||
sleep_time = backoff_factor # 固定等待时间
|
||||
logger.warning(f"遇到非 429 错误(错误代码:{error_code} - {error_code_string}),将等待 {sleep_time} 秒后重试...")
|
||||
time.sleep(sleep_time)
|
||||
continue # 直接跳到下一次循环(即重试一次)
|
||||
else:
|
||||
logger.error(f"查询 '{user_query}' 的所有 {max_retries + 1} 次尝试均失败(错误代码:{error_code} - {error_code_string})。")
|
||||
break
|
||||
|
||||
# 如果所有尝试都失败了,返回空字符串或默认值
|
||||
if need_extra:
|
||||
return "", 0
|
||||
else:
|
||||
return ""
|
||||
|
||||
if __name__ == "__main__":
|
||||
user_query1 = """该招标文件对响应文件(投标文件)偏离项的要求或内容是怎样的?请不要回答具体的技术参数,也不要回答具体的评分要求。请以json格式给我提供信息,外层键名为'偏离',若存在嵌套信息,嵌套内容键名为文件中对应字段或是你的总结,键值为原文对应内容。若文中没有关于偏离项的相关内容,在键值中填'未知'。
|
||||
禁止内容:
|
||||
确保键值内容均基于提供的实际招标文件内容,禁止使用任何预设的示例作为回答。
|
||||
禁止返回markdown格式,请提取具体的偏离相关内容。
|
||||
示例1,嵌套键值对情况:
|
||||
{
|
||||
"偏离":{
|
||||
"技术要求":"以★标示的内容不允许负偏离",
|
||||
"商务要求":"以★标示的内容不允许负偏离"
|
||||
}
|
||||
}
|
||||
示例2,无嵌套键值对情况:
|
||||
{
|
||||
"偏离":"所有参数需在技术响应偏离表内响应,如应答有缺项,且无有效证明材料的,评标委员会有权不予认可,视同负偏离处理"
|
||||
}
|
||||
"""
|
||||
res = qianwen_plus(user_query1)
|
||||
print(res)
|
1450
flask_app/general/llm/qianwen_turbo.py
Normal file
1450
flask_app/general/llm/qianwen_turbo.py
Normal file
File diff suppressed because it is too large
Load Diff
@ -1,37 +1,12 @@
|
||||
# 基于知识库提问的通用模板,
|
||||
# assistant_id
|
||||
import re
|
||||
import queue
|
||||
import concurrent.futures
|
||||
import time
|
||||
|
||||
from dashscope import Assistants, Messages, Runs, Threads
|
||||
from llama_index.indices.managed.dashscope import DashScopeCloudRetriever
|
||||
|
||||
from flask_app.general.llm.doubao import read_txt_to_string
|
||||
from flask_app.general.llm.通义千问long_plus import qianwen_long, upload_file, qianwen_plus
|
||||
|
||||
prompt = """
|
||||
# 角色
|
||||
你是一个文档处理专家,专门负责理解和操作基于特定内容的文档任务,这包括解析、总结、搜索或生成与给定文档相关的各类信息。
|
||||
|
||||
## 技能
|
||||
### 技能 1:文档解析与摘要
|
||||
- 深入理解并分析${documents}的内容,提取关键信息。
|
||||
- 根据需求生成简洁明了的摘要,保持原文核心意义不变。
|
||||
|
||||
### 技能 2:信息检索与关联
|
||||
- 在${documents}中高效检索特定信息或关键词。
|
||||
- 能够识别并链接到文档内部或外部的相关内容,增强信息的连贯性和深度。
|
||||
|
||||
## 限制
|
||||
- 所有操作均需基于${documents}的内容,不可超出此范围创造信息。
|
||||
- 在处理敏感或机密信息时,需遵守严格的隐私和安全规定。
|
||||
- 确保所有生成或改编的内容逻辑连贯,无误导性信息。
|
||||
|
||||
请注意,上述技能执行时将直接利用并参考${documents}的具体内容,以确保所有产出紧密相关且高质量。
|
||||
"""
|
||||
prom = '请记住以下材料,他们对回答问题有帮助,请你简洁准确地给出回答,不要给出无关内容。${documents}'
|
||||
from flask_app.general.llm.大模型通用函数 import read_txt_to_string
|
||||
from flask_app.general.llm.通义千问long import qianwen_long, upload_file
|
||||
from flask_app.general.llm.qianwen_plus import qianwen_plus
|
||||
|
||||
def read_questions_from_file(file_path):
|
||||
questions = []
|
||||
|
151
flask_app/general/llm/大模型通用函数.py
Normal file
151
flask_app/general/llm/大模型通用函数.py
Normal file
@ -0,0 +1,151 @@
|
||||
import ast
|
||||
import os
|
||||
import re
|
||||
from functools import wraps
|
||||
|
||||
import PyPDF2
|
||||
from ratelimit import sleep_and_retry, limits
|
||||
import requests
|
||||
from flask_app.general.读取文件.clean_pdf import extract_common_header, clean_page_content
|
||||
|
||||
@sleep_and_retry
|
||||
@limits(calls=10, period=1) # 每秒最多调用20次,qpm=1200万,两个服务器分流,每个10
|
||||
def rate_limiter():
|
||||
pass # 这个函数本身不执行任何操作,只用于限流
|
||||
|
||||
# 创建一个共享的装饰器
|
||||
def shared_rate_limit(func):
|
||||
@wraps(func)
|
||||
def wrapper(*args, **kwargs):
|
||||
rate_limiter() # 通过共享的限流器
|
||||
return func(*args, **kwargs)
|
||||
return wrapper
|
||||
|
||||
def extract_error_details(error_message):
|
||||
"""
|
||||
从错误消息中提取错误代码和内部错误代码。
|
||||
假设错误消息的格式包含 'Error code: XXX - {...}'
|
||||
"""
|
||||
# 提取数值型错误代码
|
||||
error_code_match = re.search(r'Error code:\s*(\d+)', error_message)
|
||||
error_code = int(error_code_match.group(1)) if error_code_match else None
|
||||
|
||||
# 提取内部错误代码字符串(如 'data_inspection_failed')
|
||||
error_code_string = None
|
||||
error_dict_match = re.search(r'Error code:\s*\d+\s*-\s*(\{.*\})', error_message)
|
||||
if error_dict_match:
|
||||
error_dict_str = error_dict_match.group(1)
|
||||
try:
|
||||
# 使用 ast.literal_eval 解析字典字符串
|
||||
error_dict = ast.literal_eval(error_dict_str)
|
||||
error_code_string = error_dict.get('error', {}).get('code')
|
||||
print(error_code_string)
|
||||
except Exception as e:
|
||||
print(f"解析错误消息失败: {e}")
|
||||
|
||||
return error_code, error_code_string
|
||||
|
||||
|
||||
def get_total_tokens(text):
|
||||
"""
|
||||
调用 API 计算给定文本的总 Token 数量。 注:doubao的计算方法!与qianwen不一样
|
||||
返回:
|
||||
- int: 文本的 total_tokens 数量。
|
||||
"""
|
||||
# API 请求 URL
|
||||
url = "https://ark.cn-beijing.volces.com/api/v3/tokenization"
|
||||
|
||||
# 获取 API 密钥
|
||||
doubao_api_key = os.getenv("DOUBAO_API_KEY")
|
||||
if not doubao_api_key:
|
||||
raise ValueError("DOUBAO_API_KEY 环境变量未设置")
|
||||
|
||||
# 请求头
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": "Bearer " + doubao_api_key
|
||||
}
|
||||
model = "ep-20241119121710-425g6"
|
||||
# 请求体
|
||||
payload = {
|
||||
"model": model,
|
||||
"text": [text] # API 文档中要求 text 是一个列表
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.post(url, headers=headers, json=payload)
|
||||
response.raise_for_status()
|
||||
response_data = response.json()
|
||||
total_tokens=response_data["data"][0]["total_tokens"]
|
||||
return total_tokens
|
||||
except Exception as e:
|
||||
print(f"获取 Token 数量失败:{e}")
|
||||
return 0
|
||||
|
||||
def pdf2txt(file_path):
|
||||
common_header = extract_common_header(file_path)
|
||||
# print(f"公共抬头:{common_header}")
|
||||
# print("--------------------正文开始-------------------")
|
||||
result = ""
|
||||
with open(file_path, 'rb') as file:
|
||||
reader = PyPDF2.PdfReader(file)
|
||||
num_pages = len(reader.pages)
|
||||
# print(f"Total pages: {num_pages}")
|
||||
for page_num in range(num_pages):
|
||||
page = reader.pages[page_num]
|
||||
text = page.extract_text()
|
||||
if text:
|
||||
# print(f"--------第{page_num}页-----------")
|
||||
cleaned_text = clean_page_content(text,common_header)
|
||||
# print(cleaned_text)
|
||||
result += cleaned_text
|
||||
# print(f"Page {page_num + 1} Content:\n{cleaned_text}")
|
||||
else:
|
||||
print(f"Page {page_num + 1} is empty or text could not be extracted.")
|
||||
directory = os.path.dirname(os.path.abspath(file_path))
|
||||
output_path = os.path.join(directory, 'extract.txt')
|
||||
# 将结果保存到 extract.txt 文件中
|
||||
try:
|
||||
with open(output_path, 'w', encoding='utf-8') as output_file:
|
||||
output_file.write(result)
|
||||
print(f"提取内容已保存到: {output_path}")
|
||||
except IOError as e:
|
||||
print(f"写入文件时发生错误: {e}")
|
||||
# 返回保存的文件路径
|
||||
return output_path
|
||||
|
||||
def read_txt_to_string(file_path):
|
||||
"""
|
||||
读取txt文件内容并返回一个包含所有内容的字符串,保持原有格式。
|
||||
|
||||
参数:
|
||||
- file_path (str): txt文件的路径
|
||||
|
||||
返回:
|
||||
- str: 包含文件内容的字符串
|
||||
"""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as file: # 确保使用适当的编码
|
||||
content = file.read() # 使用 read() 保持文件格式
|
||||
return content
|
||||
except FileNotFoundError:
|
||||
return "错误:文件未找到。"
|
||||
except Exception as e:
|
||||
return f"错误:读取文件时发生错误。详细信息:{e}"
|
||||
|
||||
def generate_full_user_query(file_path, prompt_template):
|
||||
"""
|
||||
根据文件路径和提示词模板生成完整的user_query。
|
||||
|
||||
参数:
|
||||
- file_path (str): 需要解析的文件路径。
|
||||
- prompt_template (str): 包含{full_text}占位符的提示词模板。
|
||||
|
||||
返回:
|
||||
- str: 完整的user_query。
|
||||
"""
|
||||
# 假设extract_text_by_page已经定义,用于提取文件内容
|
||||
full_text=read_txt_to_string(file_path)
|
||||
# 格式化提示词,将提取的文件内容插入到模板中
|
||||
user_query = prompt_template.format(full_text=full_text)
|
||||
return user_query
|
@ -1,15 +1,16 @@
|
||||
import ast
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
import threading
|
||||
from functools import wraps
|
||||
from ratelimit import limits, sleep_and_retry
|
||||
import time
|
||||
from pathlib import Path
|
||||
from openai import OpenAI
|
||||
import os
|
||||
|
||||
from flask_app.general.llm.大模型通用函数 import get_total_tokens
|
||||
from flask_app.general.llm.qianwen_turbo import qianwen_turbo
|
||||
from flask_app.general.llm.大模型通用函数 import extract_error_details, shared_rate_limit
|
||||
|
||||
file_write_lock = threading.Lock()
|
||||
@sleep_and_retry
|
||||
@limits(calls=2, period=1)
|
||||
@ -42,42 +43,6 @@ def upload_file(file_path,output_folder=""):
|
||||
|
||||
return file_id
|
||||
|
||||
@sleep_and_retry
|
||||
@limits(calls=5, period=1) # 每秒最多调用4次
|
||||
def rate_limiter():
|
||||
pass # 这个函数本身不执行任何操作,只用于限流
|
||||
|
||||
# 创建一个共享的装饰器
|
||||
def shared_rate_limit(func):
|
||||
@wraps(func)
|
||||
def wrapper(*args, **kwargs):
|
||||
rate_limiter() # 通过共享的限流器
|
||||
return func(*args, **kwargs)
|
||||
return wrapper
|
||||
|
||||
def extract_error_details(error_message):
|
||||
"""
|
||||
从错误消息中提取错误代码和内部错误代码。
|
||||
假设错误消息的格式包含 'Error code: XXX - {...}'
|
||||
"""
|
||||
# 提取数值型错误代码
|
||||
error_code_match = re.search(r'Error code:\s*(\d+)', error_message)
|
||||
error_code = int(error_code_match.group(1)) if error_code_match else None
|
||||
|
||||
# 提取内部错误代码字符串(如 'data_inspection_failed')
|
||||
error_code_string = None
|
||||
error_dict_match = re.search(r'Error code:\s*\d+\s*-\s*(\{.*\})', error_message)
|
||||
if error_dict_match:
|
||||
error_dict_str = error_dict_match.group(1)
|
||||
try:
|
||||
# 使用 ast.literal_eval 解析字典字符串
|
||||
error_dict = ast.literal_eval(error_dict_str)
|
||||
error_code_string = error_dict.get('error', {}).get('code')
|
||||
print(error_code_string)
|
||||
except Exception as e:
|
||||
print(f"解析错误消息失败: {e}")
|
||||
|
||||
return error_code, error_code_string
|
||||
@shared_rate_limit
|
||||
def qianwen_long(file_id, user_query, max_retries=2, backoff_factor=1.0, need_extra=False):
|
||||
logger = logging.getLogger('model_log') # 通过日志名字获取记录器
|
||||
@ -139,7 +104,7 @@ def qianwen_long(file_id, user_query, max_retries=2, backoff_factor=1.0, need_ex
|
||||
logger.warning(f"错误代码为 400 - {error_code_string},将调用 qianwen_long_stream 执行一次...")
|
||||
try:
|
||||
# 超时就调用 qianwen_long_stream
|
||||
stream_result = qianwen_long_stream(file_id, user_query, max_retries=0) # 禁用内部重试
|
||||
stream_result = qianwen_long_stream(file_id, user_query, max_retries=0,backoff_factor=1,need_extra=need_extra) # 禁用内部重试
|
||||
if need_extra:
|
||||
if isinstance(stream_result, tuple) and len(stream_result) == 2:
|
||||
return stream_result[0], stream_result[1] # 返回内容和默认的 token_usage=0
|
||||
@ -271,116 +236,6 @@ def qianwen_long_stream(file_id, user_query, max_retries=2, backoff_factor=1.0,
|
||||
else:
|
||||
return ""
|
||||
|
||||
@sleep_and_retry
|
||||
@limits(calls=4, period=1) # 每秒最多调用4次
|
||||
def qianwen_plus(user_query, need_extra=False):
|
||||
logger = logging.getLogger('model_log') # 通过日志名字获取记录器
|
||||
print("call qianwen-plus...")
|
||||
|
||||
"""
|
||||
使用之前上传的文件,根据用户查询生成响应,并实时显示流式输出。
|
||||
目前命中缓存局限:1.不足 256 Token 的内容不会被缓存。
|
||||
2.上下文缓存的命中概率并不是100%,即使是上下文完全一致的请求,也存在无法命中的概率,命中概率依据系统判断而定。
|
||||
3.若多线程同时请求,缓存无法及时更新!
|
||||
参数:
|
||||
- user_query: 用户查询
|
||||
- need_extra: 是否需要返回额外数据(默认 False)
|
||||
返回:
|
||||
- 当 need_extra=False 时: 返回响应内容 (str)
|
||||
- 当 need_extra=True 时: 返回 (响应内容, token_usage)
|
||||
"""
|
||||
|
||||
# 内部定义重试参数
|
||||
max_retries = 2
|
||||
backoff_factor = 2.0
|
||||
|
||||
client = OpenAI(
|
||||
api_key=os.getenv("DASHSCOPE_API_KEY"),
|
||||
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
|
||||
)
|
||||
|
||||
for attempt in range(1, max_retries + 2): # +1 是为了包括初始调用
|
||||
try:
|
||||
completion_tokens = 0 # 初始化 completion_tokens 为 0
|
||||
# 生成基于用户查询的响应
|
||||
completion = client.chat.completions.create(
|
||||
model="qwen-plus",
|
||||
temperature=0.5,
|
||||
messages=[
|
||||
{
|
||||
'role': 'user',
|
||||
'content': user_query
|
||||
}
|
||||
],
|
||||
stream=True, # 启用流式响应
|
||||
stream_options={"include_usage": True}
|
||||
)
|
||||
|
||||
full_response = "" # 用于存储完整的响应内容
|
||||
|
||||
for chunk in completion:
|
||||
# 解析 chunk 数据
|
||||
if hasattr(chunk, 'to_dict'):
|
||||
chunk_data = chunk.to_dict()
|
||||
else:
|
||||
chunk_data = json.loads(chunk.model_dump_json())
|
||||
|
||||
# 处理 usage 信息
|
||||
usage = chunk_data.get('usage')
|
||||
if usage is not None:
|
||||
completion_tokens = usage.get('completion_tokens', 0)
|
||||
# prompt_tokens_details = usage.get('prompt_tokens_details', {}) #命中tokens ,取消注释可以print
|
||||
# cache_hit = prompt_tokens_details.get('cached_tokens', 0)
|
||||
# print("命中:"+str(cache_hit))
|
||||
# 处理 choices 信息
|
||||
choices = chunk_data.get('choices', [])
|
||||
if choices:
|
||||
choice = choices[0]
|
||||
delta = choice.get('delta', {})
|
||||
content = delta.get('content', '')
|
||||
if content:
|
||||
full_response += content
|
||||
# 实时打印内容(可以取消注释下面一行以实时输出)
|
||||
# print(content, end='', flush=True)
|
||||
if choice.get('finish_reason'):
|
||||
# 处理完成原因(如果需要)
|
||||
pass # 或者记录 finish_reason 以供后续使用
|
||||
|
||||
if need_extra:
|
||||
return full_response, completion_tokens
|
||||
else:
|
||||
return full_response
|
||||
|
||||
except Exception as exc:
|
||||
# 提取错误代码
|
||||
error_code, error_code_string = extract_error_details(str(exc))
|
||||
logger.error(f"第 {attempt} 次尝试失败,查询:'{user_query}',错误:{exc}", exc_info=True)
|
||||
|
||||
if error_code == 429:
|
||||
if attempt <= max_retries:
|
||||
sleep_time = backoff_factor * (2 ** (attempt - 1)) # 指数退避
|
||||
logger.warning(f"错误代码为 429,将在 {sleep_time} 秒后重试...")
|
||||
time.sleep(sleep_time)
|
||||
else:
|
||||
logger.error(f"查询 '{user_query}' 的所有 {max_retries + 1} 次尝试均失败(429 错误)。")
|
||||
break
|
||||
else:
|
||||
# 针对非 429 错误,只重试一次
|
||||
if attempt <= 1:
|
||||
sleep_time = backoff_factor # 固定等待时间
|
||||
logger.warning(f"遇到非 429 错误(错误代码:{error_code} - {error_code_string}),将等待 {sleep_time} 秒后重试...")
|
||||
time.sleep(sleep_time)
|
||||
continue # 直接跳到下一次循环(即重试一次)
|
||||
else:
|
||||
logger.error(f"查询 '{user_query}' 的所有 {max_retries + 1} 次尝试均失败(错误代码:{error_code} - {error_code_string})。")
|
||||
break
|
||||
|
||||
# 如果所有尝试都失败了,返回空字符串或默认值
|
||||
if need_extra:
|
||||
return "", 0
|
||||
else:
|
||||
return ""
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Example file path - replace with your actual file path
|
||||
|
@ -3,12 +3,13 @@ import os
|
||||
import re
|
||||
import time
|
||||
from typing import Any, Dict
|
||||
from flask_app.general.llm.doubao import read_txt_to_string
|
||||
from flask_app.general.llm.大模型通用函数 import read_txt_to_string
|
||||
from flask_app.general.file2markdown import convert_file_to_markdown
|
||||
from flask_app.general.format_change import get_pdf_page_count, pdf2docx
|
||||
from flask_app.general.json_utils import extract_content_from_json
|
||||
from flask_app.general.llm.model_continue_query import process_continue_answers
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long, qianwen_plus, qianwen_long_stream
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long, qianwen_long_stream
|
||||
from flask_app.general.llm.qianwen_plus import qianwen_plus
|
||||
|
||||
|
||||
def remove_unknown_scores(data):
|
||||
@ -22,7 +23,11 @@ def remove_unknown_scores(data):
|
||||
return [remove_unknown_scores(item) for item in data]
|
||||
else:
|
||||
return data
|
||||
|
||||
def combine_technical_and_business(data):
|
||||
'''
|
||||
后处理,区分技术评分和商务评分,给外键添加总的评分
|
||||
'''
|
||||
data = remove_unknown_scores(data)
|
||||
extracted_data = {
|
||||
'技术评分': {},
|
||||
@ -442,9 +447,9 @@ def combine_evaluation_standards(evaluation_method_path,invalid_path,zb_type):
|
||||
|
||||
# 定义用户查询
|
||||
query = (
|
||||
"""根据该文档,你判断它是否有关于技术评分或商务评分或投标报价的具体的评分及要求如果有,返回'是',否则返回'否'。
|
||||
"""根据该文档,你判断它是否有关于技术评分或商务评分或投标报价的具体的评分及要求,如果有,返回'是',否则返回'否'。
|
||||
要求与指南:
|
||||
1. 评分要求主要以表格形式呈现,且有评分因素及评分要求、标准。
|
||||
1. 评分要求主要以表格形式呈现,且有评分因素及评分要求、标准,其中评分因素可以是笼统的评分大项如'技术评分'或'商务评分'。
|
||||
2. 竞争性磋商文件通常无评分要求,但若满足'1.'的内容,也请返回'是'。
|
||||
3. 仅返回'是'或'否',不需要其他解释或内容。
|
||||
"""
|
||||
@ -551,8 +556,8 @@ def combine_evaluation_standards(evaluation_method_path,invalid_path,zb_type):
|
||||
if __name__ == "__main__":
|
||||
start_time=time.time()
|
||||
# truncate_file=r"C:\Users\Administrator\Desktop\招标文件-采购类\tmp2\2024-新疆-塔城地区公安局食药环分局快检实验室项目_evaluation_method.pdf"
|
||||
evaluation_method_path = r'D:\flask_project\flask_app\static\output\output1\test\招标文件-第二章-第六章-172404【电能表标准设备172404-1305001-0002】_evaluation_method.pdf'
|
||||
invalid_path=r'D:\flask_project\flask_app\static\output\output1\test\招标文件-第二章-第六章-172404【电能表标准设备172404-1305001-0002】_invalid.docx'
|
||||
evaluation_method_path = r'C:\Users\Administrator\Desktop\货物标\output2\招标文件-通产丽星高端化妆品研发生产总部基地高低压配电工程_evaluation_method11.pdf'
|
||||
invalid_path=r'C:\Users\Administrator\Desktop\货物标\output2\招标文件-通产丽星高端化妆品研发生产总部基地高低压配电工程_evaluation_method11.pdf'
|
||||
# truncate_file = "C:\\Users\\Administrator\\Desktop\\货物标\\output2\\2-招标文件(统计局智能终端二次招标)_evaluation_method.pdf"
|
||||
# truncate_file="C:\\Users\\Administrator\\Desktop\\货物标\\output2\\广水市妇幼招标文件最新(W改)_evaluation_method.pdf"
|
||||
# truncate_file = "C:\\Users\\Administrator\\Desktop\\fsdownload\\2d481945-1f82-45a5-8e56-7fafea4a7793\\ztbfile_evaluation_method.pdf"
|
||||
|
@ -3,7 +3,7 @@ import json
|
||||
import re
|
||||
from flask_app.general.json_utils import clean_json_string
|
||||
from flask_app.general.llm.model_continue_query import process_continue_answers
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long_stream
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long_stream
|
||||
|
||||
#提取两个大标题之间的内容
|
||||
def extract_between_sections(data, target_values,flag=False):
|
||||
|
@ -4,8 +4,8 @@ import re
|
||||
import regex
|
||||
import time
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from flask_app.general.llm.doubao import generate_full_user_query
|
||||
from flask_app.general.llm.通义千问long_plus import qianwen_plus
|
||||
from flask_app.general.llm.大模型通用函数 import generate_full_user_query
|
||||
from flask_app.general.llm.qianwen_plus import qianwen_plus
|
||||
from flask_app.general.通用功能函数 import process_string_list
|
||||
from collections import OrderedDict
|
||||
from docx import Document
|
||||
|
@ -1,9 +1,10 @@
|
||||
# -*- encoding:utf-8 -*-
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
from flask_app.general.json_utils import clean_json_string
|
||||
from flask_app.general.llm.多线程提问 import multi_threading
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
from flask_app.old_version.判断是否分包等_old import read_questions_from_judge
|
||||
|
||||
def get_deviation_requirements(invalid_path):
|
||||
@ -85,10 +86,10 @@ def aggregate_basic_info(baseinfo_list,mode="engineering"):
|
||||
# 定义采购要求的默认值
|
||||
DEFAULT_PROCUREMENT_REQS = {
|
||||
"采购需求": {},
|
||||
"技术要求": [],
|
||||
"商务要求": [],
|
||||
"服务要求": [],
|
||||
"其他要求": []
|
||||
"技术要求": ["未提供"],
|
||||
"商务要求": ["未提供"],
|
||||
"服务要求": ["未提供"],
|
||||
"其他要求": ["未提供"]
|
||||
}
|
||||
combined_data = {}
|
||||
relevant_keys_detected = set()
|
||||
|
@ -4,7 +4,7 @@ import re
|
||||
|
||||
from PyPDF2 import PdfWriter, PdfReader
|
||||
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
from flask_app.general.通用功能函数 import process_string_list
|
||||
|
||||
|
||||
|
@ -5,7 +5,7 @@ import re
|
||||
from flask_app.general.json_utils import extract_content_from_json # 可以选择性地导入特定的函数
|
||||
from flask_app.old_version.提取打勾符号_old import read_pdf_and_judge_main
|
||||
from flask_app.general.llm.多线程提问 import multi_threading
|
||||
from flask_app.general.llm.通义千问long_plus import qianwen_long,upload_file
|
||||
from flask_app.general.llm.通义千问long import qianwen_long,upload_file
|
||||
#调用qianwen-ask之后,组织提示词问百炼。
|
||||
|
||||
def construct_judge_questions(json_data):
|
||||
|
@ -3,7 +3,7 @@ import json
|
||||
|
||||
from flask_app.general.json_utils import clean_json_string
|
||||
from flask_app.general.商务技术评分提取 import combine_technical_and_business
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
|
||||
def combine_evaluation_standards(evaluation_method):
|
||||
# 商务标、技术标评分项:千问
|
||||
|
@ -4,7 +4,7 @@ from flask_app.general.json_utils import clean_json_string
|
||||
from flask_app.工程标.投标人须知正文提取指定内容工程标 import extract_from_notice
|
||||
from flask_app.old_version.判断是否分包等_old import judge_whether_main, read_questions_from_judge
|
||||
from flask_app.general.llm.多线程提问 import read_questions_from_file, multi_threading
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file
|
||||
from flask_app.general.llm.通义千问long import upload_file
|
||||
from flask_app.general.通用功能函数 import judge_consortium_bidding
|
||||
|
||||
def aggregate_basic_info_engineering(baseinfo_list):
|
||||
|
@ -4,8 +4,8 @@ import re
|
||||
import regex
|
||||
import time
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from flask_app.general.llm.doubao import generate_full_user_query
|
||||
from flask_app.general.llm.通义千问long_plus import qianwen_plus
|
||||
from flask_app.general.llm.大模型通用函数 import generate_full_user_query
|
||||
from flask_app.general.llm.qianwen_plus import qianwen_plus
|
||||
from flask_app.general.通用功能函数 import process_string_list
|
||||
from collections import OrderedDict
|
||||
from docx import Document
|
||||
|
@ -5,7 +5,7 @@ import time
|
||||
import re
|
||||
|
||||
from flask_app.general.format_change import pdf2docx
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
|
||||
from flask_app.general.table_content_extraction import extract_tables_main
|
||||
|
@ -4,7 +4,7 @@ import time
|
||||
from flask_app.general.json_utils import clean_json_string
|
||||
from flask_app.general.商务技术评分提取 import combine_technical_and_business, \
|
||||
process_data_based_on_key, reorganize_data
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
|
||||
def combine_evaluation_standards(truncate_file):
|
||||
# 定义默认的评审结果字典
|
||||
|
@ -4,7 +4,7 @@ from flask_app.old_version.提取json工程标版_old import convert_clause_to_j
|
||||
from flask_app.general.json_utils import extract_content_from_json
|
||||
from flask_app.old_version.形式响应评审old import process_reviews
|
||||
from flask_app.old_version.资格评审old_old import process_qualification
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@ import json
|
||||
import re
|
||||
from flask_app.general.json_utils import clean_json_string, combine_json_results, add_keys_to_json
|
||||
from flask_app.general.llm.多线程提问 import multi_threading, read_questions_from_file
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file
|
||||
from flask_app.general.llm.通义千问long import upload_file
|
||||
|
||||
|
||||
def merge_dictionaries_under_common_key(dicts, common_key):
|
||||
|
@ -6,7 +6,7 @@ from functools import wraps
|
||||
from flask import request, jsonify, current_app, g
|
||||
|
||||
from flask_app.general.llm.清除file_id import read_file_ids, delete_file_by_ids
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
from flask_app.logger_setup import create_logger
|
||||
|
||||
|
||||
|
@ -7,7 +7,7 @@ from copy import deepcopy
|
||||
from flask_app.general.format_change import docx2pdf,doc2docx
|
||||
from flask_app.general.json_utils import clean_json_string, rename_outer_key
|
||||
from flask_app.general.merge_pdfs import merge_pdfs
|
||||
from flask_app.general.llm.通义千问long_plus import qianwen_plus
|
||||
from flask_app.general.llm.qianwen_plus import qianwen_plus
|
||||
from flask_app.general.通用功能函数 import get_global_logger
|
||||
from flask_app.general.截取pdf_main import truncate_pdf_multiple
|
||||
from flask_app.货物标.提取采购需求main import fetch_procurement_reqs
|
||||
|
@ -1,7 +1,7 @@
|
||||
import time
|
||||
from PyPDF2 import PdfReader # 确保已安装 PyPDF2: pip install PyPDF2
|
||||
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
|
||||
|
||||
def judge_zbfile_exec(file_path):
|
||||
|
@ -6,7 +6,7 @@ import time
|
||||
from flask_app.general.format_change import docx2pdf
|
||||
from flask_app.general.json_utils import clean_json_string
|
||||
from flask_app.general.llm.多线程提问 import read_questions_from_file, multi_threading
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file
|
||||
from flask_app.general.llm.通义千问long import upload_file
|
||||
from flask_app.general.通用功能函数 import get_global_logger,aggregate_basic_info
|
||||
from flask_app.general.截取pdf_main import truncate_pdf_multiple
|
||||
from flask_app.general.post_processing import inner_post_processing
|
||||
|
@ -2,6 +2,7 @@
|
||||
import concurrent.futures
|
||||
|
||||
from flask_app.general.llm.doubao import doubao_model
|
||||
from flask_app.general.llm.qianwen_plus import qianwen_plus
|
||||
|
||||
|
||||
# 多线程压力测试
|
||||
@ -17,7 +18,7 @@ def multi_threaded_calls_doubao(user_query, num_threads=10):
|
||||
# 定义一个辅助函数来调用 doubao_model 并返回结果
|
||||
def call_function(thread_id):
|
||||
print(f"线程 {thread_id} 开始调用 doubao_model")
|
||||
result = doubao_model(user_query)
|
||||
result = qianwen_plus(user_query)
|
||||
return result
|
||||
|
||||
# 使用 ThreadPoolExecutor 来管理线程池
|
||||
|
@ -1,5 +1,5 @@
|
||||
import concurrent.futures
|
||||
from flask_app.general.llm.通义千问long_plus import qianwen_long, upload_file
|
||||
from flask_app.general.llm.通义千问long import qianwen_long, upload_file
|
||||
|
||||
|
||||
def multi_threaded_calls(file_id, user_query, num_threads=1):
|
||||
|
35
flask_app/test_case/test_轮询.py
Normal file
35
flask_app/test_case/test_轮询.py
Normal file
@ -0,0 +1,35 @@
|
||||
import os
|
||||
from itertools import cycle
|
||||
import threading
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
|
||||
# 定义三个 API Key 的循环器
|
||||
api_keys = cycle([1, 2, 3])
|
||||
# 使用锁保证线程安全
|
||||
lock = threading.Lock()
|
||||
|
||||
def test():
|
||||
# 使用锁保护对 api_keys 的访问
|
||||
with lock:
|
||||
key = next(api_keys)
|
||||
print(key)
|
||||
return key
|
||||
|
||||
def main():
|
||||
results = []
|
||||
num_calls = 100 # 模拟 100 次并发调用
|
||||
# 使用 ThreadPoolExecutor 模拟高并发
|
||||
with ThreadPoolExecutor(max_workers=20) as executor:
|
||||
futures = [executor.submit(test) for _ in range(num_calls)]
|
||||
# 收集所有返回的结果
|
||||
for future in futures:
|
||||
results.append(future.result())
|
||||
|
||||
# 统计各个 API Key 被使用的次数
|
||||
counts = {1: 0, 2: 0, 3: 0}
|
||||
for key in results:
|
||||
counts[key] += 1
|
||||
print("调用分布:", counts)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
@ -7,7 +7,7 @@ from flask_app.general.通用功能函数 import process_judge_questions, aggreg
|
||||
from flask_app.general.投标人须知正文提取指定内容 import extract_from_notice
|
||||
from flask_app.old_version.判断是否分包等_old import merge_json_to_list
|
||||
from flask_app.general.llm.多线程提问 import read_questions_from_file, multi_threading
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file
|
||||
from flask_app.general.llm.通义千问long import upload_file
|
||||
|
||||
def update_baseinfo_lists(baseinfo_list1, baseinfo_list2):
|
||||
# 创建一个字典,用于存储 baseinfo_list1 中的所有键值对
|
||||
|
@ -8,7 +8,7 @@ from flask_app.general.llm.多线程提问 import multi_threading
|
||||
from flask_app.工程标.根据条款号整合json import process_and_merge_entries,process_and_merge2
|
||||
from flask_app.general.json_utils import clean_json_string
|
||||
from flask_app.general.投标人须知正文条款提取成json文件 import convert_clause_to_json
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file
|
||||
from flask_app.general.llm.通义千问long import upload_file
|
||||
from flask_app.general.merge_pdfs import merge_pdfs
|
||||
def update_json_data(original_data, updates1, updates2,second_response_list):
|
||||
"""
|
||||
|
@ -251,12 +251,12 @@ if __name__ == "__main__":
|
||||
logger = get_global_logger("123")
|
||||
start_time = time.time()
|
||||
# input_path = r"C:\Users\Administrator\Desktop\new招标文件\工程标"
|
||||
pdf_path=r"C:\Users\Administrator\Desktop\新建文件夹 (3)\新建文件夹\(存储存在问题)惠安县招标文件.pdf"
|
||||
pdf_path=r"C:\Users\Administrator\Desktop\货物标\zbfiles\招标文件-通产丽星高端化妆品研发生产总部基地高低压配电工程.pdf"
|
||||
|
||||
# pdf_path = r"C:\Users\Administrator\Desktop\招标文件\招标02.pdf"
|
||||
# input_path=r"C:\Users\Administrator\Desktop\招标文件\招标test文件夹\zbtest8.pdf"
|
||||
output_folder = r"C:\Users\Administrator\Desktop\新建文件夹 (3)\新建文件夹"
|
||||
selection = 4 # 例如:1 - 公告 notice , 2 - 评标办法 evaluation_method, 3 - 资格审查后缀有qualification1或qualification2(与评标办法一致) 4.投标人须知前附表part1 投标人须知正文part2 5-invalid
|
||||
output_folder = r"C:\Users\Administrator\Desktop\货物标\output2"
|
||||
selection = 2 # 例如:1 - 公告 notice , 2 - 评标办法 evaluation_method, 3 - 资格审查后缀有qualification1或qualification2(与评标办法一致) 4.投标人须知前附表part1 投标人须知正文part2 5-invalid
|
||||
generated_files = truncate_pdf_main_engineering(pdf_path, output_folder, selection, logger)
|
||||
print(generated_files)
|
||||
# print("生成的文件:", generated_files)
|
||||
|
@ -7,7 +7,7 @@ from flask_app.general.json_utils import extract_content_from_json, clean_json_s
|
||||
from flask_app.general.table_content_extraction import extract_tables_main
|
||||
from flask_app.工程标.形式响应评审 import process_reviews
|
||||
from flask_app.工程标.资格评审 import process_qualification
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from flask_app.货物标.资格审查main import combine_qualification_review
|
||||
from flask_app.general.merge_pdfs import merge_pdfs
|
||||
|
@ -4,7 +4,7 @@ import json
|
||||
import re
|
||||
from flask_app.general.json_utils import clean_json_string, combine_json_results, add_keys_to_json
|
||||
from flask_app.general.llm.多线程提问 import multi_threading, read_questions_from_file
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
|
||||
|
||||
# 这个函数的主要用途是将多个相关的字典(都包含 'common_key' 键)合并成一个更大的、综合的字典,所有相关信息都集中在 'common_key' 键下
|
||||
|
@ -4,11 +4,12 @@ import re
|
||||
import fitz
|
||||
from PyPDF2 import PdfReader
|
||||
import textwrap
|
||||
from flask_app.general.llm.doubao import read_txt_to_string
|
||||
from flask_app.general.llm.大模型通用函数 import read_txt_to_string
|
||||
from flask_app.general.json_utils import clean_json_string
|
||||
from flask_app.general.llm.model_continue_query import process_continue_answers
|
||||
from flask_app.general.截取pdf通用函数 import create_get_text_function
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long_stream, qianwen_plus
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long_stream
|
||||
from flask_app.general.llm.qianwen_plus import qianwen_plus
|
||||
from flask_app.general.读取文件.clean_pdf import extract_common_header, clean_page_content
|
||||
from flask_app.general.format_change import docx2pdf, pdf2docx
|
||||
import concurrent.futures
|
||||
|
@ -6,7 +6,7 @@ import concurrent.futures
|
||||
from flask_app.general.json_utils import clean_json_string, add_outer_key
|
||||
from flask_app.general.通用功能函数 import process_judge_questions, aggregate_basic_info
|
||||
from flask_app.general.llm.多线程提问 import read_questions_from_file, multi_threading
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file
|
||||
from flask_app.general.llm.通义千问long import upload_file
|
||||
from flask_app.old_version.判断是否分包等_old import merge_json_to_list
|
||||
from flask_app.general.投标人须知正文提取指定内容 import extract_from_notice
|
||||
from flask_app.货物标.提取采购需求main import fetch_procurement_reqs
|
||||
@ -60,19 +60,20 @@ def combine_basic_info(merged_baseinfo_path, procurement_path,clause_path,invali
|
||||
# 合并结果
|
||||
baseinfo_list += temp_list # temp_list 是一个列表
|
||||
baseinfo_list.append(procurement_reqs) # procurement_reqs 是一个字典
|
||||
aggregated_baseinfo = aggregate_basic_info(baseinfo_list,'goods')
|
||||
aggregated_baseinfo = aggregate_basic_info(baseinfo_list,'goods') #重新组织基础信息结构
|
||||
|
||||
return {"基础信息": aggregated_baseinfo}
|
||||
|
||||
if __name__ == "__main__":
|
||||
start_time=time.time()
|
||||
# baseinfo_file_path = "C:\\Users\\Administrator\\Desktop\\货物标\\truncate_all\\ztbfile_merged_baseinfo\\ztbfile_merged_baseinfo_3-31.pdf"
|
||||
merged_baseinfo_path=r"C:\Users\Administrator\Desktop\fsdownload\0c80edcc-cc86-4d53-8bd4-78a531446760\ztbfile.docx"
|
||||
merged_baseinfo_path=r"D:\flask_project\flask_app\static\output\output1\ecd3374a-a19e-475e-a4dd-e803a3bc1fbf\ztbfile_merged_baseinfo.pdf"
|
||||
# procurement_file_path = "C:\\Users\\Administrator\\Desktop\\fsdownload\\b4601ea1-f087-4fa2-88ae-336ad4d8e1e9\\tmp\\ztbfile_procurement.pdf"
|
||||
clause_path=r'D:\flask_project\flask_app\static\output\output1\3783ce68-1839-4449-97e6-cd07749d8664\clause1.json'
|
||||
invalid_path=r"C:\Users\Administrator\Desktop\fsdownload\0c80edcc-cc86-4d53-8bd4-78a531446760\ztbfile.docx"
|
||||
clause_path=r'D:\flask_project\flask_app\static\output\output1\ecd3374a-a19e-475e-a4dd-e803a3bc1fbf\clause1.json'
|
||||
invalid_path=r"D:\flask_project\flask_app\static\output\output1\ecd3374a-a19e-475e-a4dd-e803a3bc1fbf\ztbfile.docx"
|
||||
procurement_path=r'D:\flask_project\flask_app\static\output\output1\ecd3374a-a19e-475e-a4dd-e803a3bc1fbf\ztbfile_procurement.pdf'
|
||||
# res = combine_basic_info(merged_baseinfo_path, procurement_file_path,clause_path)
|
||||
res=combine_basic_info(merged_baseinfo_path,"","",invalid_path)
|
||||
res=combine_basic_info(merged_baseinfo_path,procurement_path,clause_path,invalid_path)
|
||||
print("------------------------------------")
|
||||
print(json.dumps(res, ensure_ascii=False, indent=4))
|
||||
end_time=time.time()
|
||||
|
@ -319,11 +319,11 @@ if __name__ == "__main__":
|
||||
logger = get_global_logger("123")
|
||||
# input_path = r"C:\Users\Administrator\Desktop\new招标文件\货物标"
|
||||
# pdf_path = r"C:\Users\Administrator\Desktop\招标文件-采购类\2024-贵州-贵州医科大学附属医院导视系统零星制作安装项目.pdf"
|
||||
pdf_path=r"C:\Users\Administrator\Desktop\货物标\zbfiles\招标文件(107国道).pdf"
|
||||
pdf_path=r"C:\Users\Administrator\Desktop\货物标\zbfiles\file1737339029349.pdf"
|
||||
# input_path = r"C:\Users\Administrator\Desktop\货物标\zbfiles\2-招标文件(广水市教育局封闭管理).pdf"
|
||||
# pdf_path=r"C:\Users\Administrator\Desktop\文件解析问题\文件解析问题\1414cb9c-7bf4-401c-8761-2acde151b9c2\ztbfile.pdf"
|
||||
output_folder = r"C:\Users\Administrator\Desktop\货物标\zbfiles\output6"
|
||||
output_folder = r"C:\Users\Administrator\Desktop\货物标\output2"
|
||||
# output_folder = r"C:\Users\Administrator\Desktop\new招标文件\output2"
|
||||
selection = 6 # 例如:1 - 公告 notice , 2 - 评标办法 evaluation_method, 3 - 资格审查后缀有qualification1或qualification2(与评标办法一致) 4.投标人须知前附表part1 投标人须知正文part2 5-采购需求 procurement 6-invalid
|
||||
selection = 2 # 例如:1 - 公告 notice , 2 - 评标办法 evaluation_method, 3 - 资格审查后缀有qualification1或qualification2(与评标办法一致) 4.投标人须知前附表part1 投标人须知正文part2 5-采购需求 procurement 6-invalid
|
||||
generated_files = truncate_pdf_main_goods(pdf_path, output_folder, selection,logger)
|
||||
print(generated_files)
|
||||
|
@ -8,9 +8,10 @@ from copy import deepcopy
|
||||
from flask_app.general.llm.model_continue_query import process_continue_answers
|
||||
from flask_app.general.format_change import pdf2docx
|
||||
from flask_app.general.llm.多线程提问 import multi_threading
|
||||
from flask_app.general.llm.通义千问long_plus import qianwen_long, upload_file, qianwen_plus
|
||||
from flask_app.general.llm.通义千问long import qianwen_long, upload_file
|
||||
from flask_app.general.llm.qianwen_plus import qianwen_plus
|
||||
from flask_app.general.json_utils import clean_json_string
|
||||
from flask_app.general.llm.doubao import read_txt_to_string
|
||||
from flask_app.general.llm.大模型通用函数 import read_txt_to_string
|
||||
from flask_app.货物标.技术参数要求提取后处理函数 import main_postprocess
|
||||
|
||||
|
||||
|
@ -3,7 +3,7 @@ import json
|
||||
import re
|
||||
import time
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
from flask_app.general.llm.通义千问long_plus import upload_file, qianwen_long
|
||||
from flask_app.general.llm.通义千问long import upload_file, qianwen_long
|
||||
from flask_app.general.json_utils import clean_json_string
|
||||
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user