2024-12-03 11:50:15 +08:00

23 lines
973 B
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# tbpu : text block processing unit
# 文块处理器的基类。
# OCR返回的结果中一项包含文字、包围盒、置信度的元素称为一个“文块” - text block 。
# 文块不一定是完整的一句话或一个段落。反之,一般是零散的文字。
# 一个OCR结果常由多个文块组成。
# 文块处理器就是:将传入的多个文块进行处理,比如合并、排序、删除文块。
class Tbpu:
def __init__(self):
self.tbpuName = "文块处理单元-未知"
def run(self, textBlocks):
"""输入textBlocks文块列表。例\n
[
{'box': [[29, 19], [172, 19], [172, 44], [29, 44]], 'score': 0.89, 'text': '文本111'},
{'box': [[29, 60], [161, 60], [161, 86], [29, 86]], 'score': 0.75, 'text': '文本222'},
]
输出排序后的textBlocks文块列表每个块增加键
'end' 结尾间隔符
"""
return textBlocks