Uni-Parser
I.

前言

选择 UniParser-Tools 或 UniParser-API,并了解调用前需要知道的基础配置。

Uni-Parser 提供两种接入方式:

方式 说明 适用场景
UniParser-Tools 高层封装的 Python 库 推荐给大多数用户。它封装了提交任务、轮询、结果获取和常见格式转换,适合脚本、批处理、科研工作流和 Agent 集成。
UniParser-API 原生 HTTP 调用 适合需要直接控制 HTTP 请求、鉴权、轮询、回调、字段开关或系统集成细节的场景。

如果不确定选哪一种,优先使用 UniParser-Tools;只有在需要细粒度控制接口行为时,再使用 UniParser-API。

极简流程:

  1. 获取 X-API-Key
  2. 提交文件、图片片段或 URL 解析任务。
  3. 选择同步等待 sync=true,或异步提交后根据 token 获取结果。
  4. 获取 contentobjectspages_dictpages_tree 等结果。
  5. 按需选择 Markdown、HTML、LaTeX、Plain 等输出格式。

Uni-Parser supports two integration methods:

Method Description When to use it
UniParser-Tools A high-level Python library wrapper Recommended for most users. It wraps task submission, polling, result fetching, and common output conversion for scripts, batch jobs, research workflows, and agent integrations.
UniParser-API Native HTTP calls Use this when you need direct control over HTTP requests, authentication, polling, callbacks, output switches, or system integration details.

If you are unsure which path to choose, start with UniParser-Tools. Use UniParser-API only when you need fine-grained control over the HTTP workflow.

Minimal workflow:

  1. Get an X-API-Key.
  2. Submit a file, image snippet, or URL parsing task.
  3. Use sync=true to wait synchronously, or submit asynchronously and fetch results later with the token.
  4. Fetch content, objects, pages_dict, pages_tree, or other result fields.
  5. Choose Markdown, HTML, LaTeX, Plain, or other output formats as needed.
II.

UniParser-Tools

在GitHub查看

推荐给大多数用户的 Python 客户端封装。

快速开始

以下为基础调用方式;完整示例与最新用法请以仓库内的 README.md 和 notebook 为准。

1. 安装

UniParser-Tools 是 UniParser-API 的 Python 客户端封装,适合在脚本、notebook、批处理和 Agent 工作流中直接调用。

git clone https://github.com/dptech-corp/UniParser-Tools.git
cd UniParser-Tools
pip install -r requirements.txt
pip install -e .

2. 初始化客户端

初始化客户端时需要提供 api_key。认证方式和 API-Key 配置参考用户认证

import os
from uniparser_tools.api.clients import UniParserClient

parser = UniParserClient(
    host="https://uniparser.dp.tech/",
    api_key=os.getenv("UNIPARSER_API_KEY"),
)

3. 提交 PDF 解析任务

UniParser-Tools 封装了不同解析入口:

  • 文件解析:trigger_file
  • 图片片段解析:trigger_snip
  • URL 解析:trigger_url

常用请求格式参考请求格式。UniParser-Tools 会自动生成唯一 token 并随 trigger 请求提交,通常不需要手动指定;只有需要复用外部任务索引时才建议显式传入。

解析配置用于控制提交任务时解析哪些语义元素,以及各元素采用快速解析、高质量解析或返回原始图像等模式。ParseMode / ParseModeTextual 只用于提交阶段,不要传给 get_formatted。详情参考解析配置 SemanticType

from uniparser_tools.common.constant import ParseMode, ParseModeTextual

result = parser.trigger_file(
    file_path="example.pdf",
    textual=ParseModeTextual.OCRHighQuality,  # high quality
    equation=ParseMode.OCRHighQuality,        # high quality
    table=ParseMode.OCRHighQuality,           # high quality
    chart=ParseMode.DumpBase64,               # original image base64
    figure=ParseMode.DumpBase64,              # original image base64
    expression=ParseMode.DumpBase64,          # original image base64
    molecule=ParseMode.OCRFast,               # fast
)

token = result["token"]

图片片段解析使用 trigger_snip,适合解析截图、局部页面或单张图片:

snip_result = parser.trigger_snip(
    snip_path="example.png",
    textual=ParseModeTextual.OCRHighQuality,
    table=ParseMode.OCRHighQuality,
    equation=ParseMode.OCRHighQuality,
)

snip_token = snip_result["token"]

URL 解析使用 trigger_url,适合提交公网可访问的 PDF 链接:

url_result = parser.trigger_url(
    pdf_url="https://example.com/paper.pdf",
    textual=ParseModeTextual.OCRHighQuality,
    table=ParseMode.OCRHighQuality,
    equation=ParseMode.OCRHighQuality,
)

url_token = url_result["token"]

4. 获取 Markdown 结果

获取结果时可以通过输出开关控制返回全文 content、语义块 objects、页级结构 pages_dict 或层级结构 pages_tree;本示例开启 content=True 获取格式化全文。FormatFlag 只用于获取结果阶段,不要传给 trigger_file。输出开关详情参考输出配置,Markdown、HTML、LaTeX 等格式参考输出格式 FormatFlag

from uniparser_tools.common.constant import FormatFlag

result = parser.get_formatted(
    token,
    content=True,
    textual=FormatFlag.Markdown,
    table=FormatFlag.Markdown,
    equation=FormatFlag.Markdown,
)

print(result["content"])

Quick Start

The following is the basic workflow. For complete and latest examples, see the repo README.md and notebooks.

1. Install

UniParser-Tools is a Python client wrapper for UniParser-API, suitable for scripts, notebooks, batch processing, and agent workflows.

git clone https://github.com/dptech-corp/UniParser-Tools.git
cd UniParser-Tools
pip install -r requirements.txt
pip install -e .

2. Initialize the client

Provide an api_key when initializing the client. See User Authentication for authentication and API-Key configuration details.

import os
from uniparser_tools.api.clients import UniParserClient

parser = UniParserClient(
    host="https://uniparser.dp.tech/",
    api_key=os.getenv("UNIPARSER_API_KEY"),
)

3. Submit a PDF parsing task

UniParser-Tools wraps different parsing entry points:

  • File parsing: trigger_file
  • Image snippet parsing: trigger_snip
  • URL parsing: trigger_url

See Request Format for common request formats. UniParser-Tools automatically generates a unique token and submits it with the trigger request, so you usually do not need to provide one manually; pass one explicitly only when you need to reuse an external task index.

Parsing options control which semantic elements are parsed when submitting a task, and whether each element uses fast parsing, high-quality parsing, or returns the original image. ParseMode / ParseModeTextual are only for the submit stage; do not pass them to get_formatted. See Parsing Options SemanticType for details.

from uniparser_tools.common.constant import ParseMode, ParseModeTextual

result = parser.trigger_file(
    file_path="example.pdf",
    textual=ParseModeTextual.OCRHighQuality,  # high quality
    equation=ParseMode.OCRHighQuality,        # high quality
    table=ParseMode.OCRHighQuality,           # high quality
    chart=ParseMode.DumpBase64,               # original image base64
    figure=ParseMode.DumpBase64,              # original image base64
    expression=ParseMode.DumpBase64,          # original image base64
    molecule=ParseMode.OCRFast,               # fast
)

token = result["token"]

Use trigger_snip for image snippets, screenshots, page regions, or single images:

snip_result = parser.trigger_snip(
    snip_path="example.png",
    textual=ParseModeTextual.OCRHighQuality,
    table=ParseMode.OCRHighQuality,
    equation=ParseMode.OCRHighQuality,
)

snip_token = snip_result["token"]

Use trigger_url for publicly accessible PDF URLs:

url_result = parser.trigger_url(
    pdf_url="https://example.com/paper.pdf",
    textual=ParseModeTextual.OCRHighQuality,
    table=ParseMode.OCRHighQuality,
    equation=ParseMode.OCRHighQuality,
)

url_token = url_result["token"]

4. Fetch Markdown output

When fetching results, output switches control whether to return full text content, semantic blocks objects, page-level structures pages_dict, or hierarchical structures pages_tree. This example enables content=True to fetch formatted full text. FormatFlag is only for the result-fetching stage; do not pass it to trigger_file. See Output Switches for output switches, and Output FormatFlag for Markdown, HTML, LaTeX, and other formats.

from uniparser_tools.common.constant import FormatFlag

result = parser.get_formatted(
    token,
    content=True,
    textual=FormatFlag.Markdown,
    table=FormatFlag.Markdown,
    equation=FormatFlag.Markdown,
)

print(result["content"])
III.

UniParser-API

使用 curl 或 requests 直接提交解析并获取结果。

UniParser-API 适合直接用 HTTP 接口提交解析任务并获取结果。示例使用 curlrequests,不依赖 UniParser-Tools 客户端。

1. 提交解析任务

使用 /trigger-file-async 上传 PDF,并通过表单字段设置 token 和解析配置项。token 是 trigger 必填字段,使用 curl / requests 时需要手动提供:

# sync=true 表示提交请求会等待解析完成;2 表示高质量解析,1 表示快速解析。
curl -X POST "https://uniparser.dp.tech/trigger-file-async" \
  -H "X-API-Key: your_api_key" \
  -F "token=example_001" \
  -F "file=@example.pdf" \
  -F "sync=true" \
  -F "textual=2" \
  -F "equation=2" \
  -F "table=2" \
  -F "molecule=1"

Python requests 示例:

import os
import requests

host = "https://uniparser.dp.tech/"
headers = {"X-API-Key": os.environ["UNIPARSER_API_KEY"]}
token = "example_001"
data = {
    "token": token,
    "sync": "true",  # 等待解析完成;大文件或批处理可改为 "false"
    "textual": "2",  # 2 = 高质量解析
    "equation": "2",  # 2 = 高质量解析
    "table": "2",  # 2 = 高质量解析
    "molecule": "1",  # 1 = 快速解析
}

with open("example.pdf", "rb") as file:
    response = requests.post(
        f"{host}trigger-file-async",
        headers=headers,
        data=data,
        files={"file": file},
        timeout=60,
    )
response.raise_for_status()

2. 获取结果

使用提交解析时提供的 token 获取格式化结果。/get-formatted 接收 JSON body。仅需要 Markdown 全文时:

curl -X POST "https://uniparser.dp.tech/get-formatted" \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "token": "example_001",
    "content": true,
    "textual": "markdown",
    "table": "markdown",
    "equation": "markdown"
  }'

Python requests 示例:

payload = {
    "token": token,
    "content": True,
    "textual": "markdown",
    "table": "markdown",
    "equation": "markdown",
}

response = requests.post(
    f"{host}get-formatted",
    headers=headers,
    json=payload,
    timeout=60,
)
response.raise_for_status()
result = response.json()
print(result["content"])

如需坐标、类型或层级结构,可开启 objectspages_dictpages_tree

The native API lets you submit parsing tasks and fetch results directly over HTTP. The examples below use curl and requests; they do not rely on the UniParser-Tools client.

1. Submit a Parsing Task

Use /trigger-file-async to upload a PDF and set token plus parsing options as form fields. token is required by trigger endpoints; when using curl / requests, you must provide it manually:

# sync=true waits for parsing to finish; 2 means high-quality parsing, 1 means fast parsing.
curl -X POST "https://uniparser.dp.tech/trigger-file-async" \
  -H "X-API-Key: your_api_key" \
  -F "token=example_001" \
  -F "file=@example.pdf" \
  -F "sync=true" \
  -F "textual=2" \
  -F "equation=2" \
  -F "table=2" \
  -F "molecule=1"

Python requests example:

import os
import requests

host = "https://uniparser.dp.tech/"
headers = {"X-API-Key": os.environ["UNIPARSER_API_KEY"]}
token = "example_001"
data = {
    "token": token,
    "sync": "true",  # wait for parsing to finish; use "false" for large files or batches
    "textual": "2",  # 2 = high-quality parsing
    "equation": "2",  # 2 = high-quality parsing
    "table": "2",  # 2 = high-quality parsing
    "molecule": "1",  # 1 = fast parsing
}

with open("example.pdf", "rb") as file:
    response = requests.post(
        f"{host}trigger-file-async",
        headers=headers,
        data=data,
        files={"file": file},
        timeout=60,
    )
response.raise_for_status()

2. Fetch Results

Use the token you provided when submitting the task to fetch formatted results. /get-formatted accepts a JSON body. For Markdown full text:

curl -X POST "https://uniparser.dp.tech/get-formatted" \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "token": "example_001",
    "content": true,
    "textual": "markdown",
    "table": "markdown",
    "equation": "markdown"
  }'

Python requests example:

payload = {
    "token": token,
    "content": True,
    "textual": "markdown",
    "table": "markdown",
    "equation": "markdown",
}

response = requests.post(
    f"{host}get-formatted",
    headers=headers,
    json=payload,
    timeout=60,
)
response.raise_for_status()
result = response.json()
print(result["content"])

If you need coordinates, block types, or hierarchy, enable objects, pages_dict, or pages_tree.

IV.

参考信息

集中说明解析结果中的版面类型、输出格式和错误码。

用户认证

UniParser-API 使用请求头 X-API-Key 进行身份认证。你可以在 UniParser 主页注册访客账号获取基础 API-Key;如需更高额度、更高权限或长期账号,请联系我们开通。

curl -X POST "https://uniparser.dp.tech/trigger-file-async" \
  -H "X-API-Key: your_api_key" \
  -F "token=example_001" \
  -F "file=@example.pdf" \
  -F "sync=true"

推荐将密钥放在环境变量中,例如 UNIPARSER_API_KEY,避免写入代码仓库。

请求格式

接口 请求格式 说明
/trigger-file-async multipart/form-data 上传 PDF 文件,文件字段为 file,其他参数也通过表单字段传入
/trigger-snip-async / /trigger-url-async JSON body 图片片段或 URL 解析请求
/get-formatted JSON body 格式化结果获取请求

注意:所有 trigger 接口都必须提供 tokentoken 是解析任务的唯一索引,作用类似 UUID。UniParser-Tools 会自动生成唯一 token;使用 curlrequests 调用 UniParser-API 时必须手动指定,并自行保证唯一性,推荐使用 UUID。

sync 控制任务提交后的等待方式:sync=false 是默认异步模式,提交后立即返回任务状态和 token,需要稍后用同一 token 获取结果;sync=true 会在提交请求中等待解析完成或超时,适合示例、调试和小文件。大文件或批处理建议使用异步模式,并根据任务大小合理设置 timeout(默认 1800 秒)。

解析配置 SemanticType

提交解析任务时,可以分别控制以下语义元素:

阶段提示:SemanticTypeParseMode / ParseModeTextual 属于提交阶段,用于决定解析哪些内容以及解析质量;不要在 /get-formattedget_formatted 中使用它们。

参数 含义 可用模式
textual 普通文本、标题、脚注等 0 / 1 / 2 / 3
equation 独立数学公式 0 / 1 / 2 / -1
table 表格 0 / 1 / 2 / -1
chart 图表 0 / 1 / -1
figure 图片、插图、照片 0 / 1 / -1
expression 化学反应式 0 / 1 / -1
molecule 化学分子结构 0 / 1 / -1

取值说明:0 表示禁用,1 表示快速解析,2 表示高质量解析,-1 表示返回原始图像 Base64。textual 还支持 3,用于从数字原生 PDF 中直接抽取文字。

枚举名 对应数值
ParseMode.DumpBase64 / ParseModeTextual.DumpBase64 -1
ParseMode.OCRFast / ParseModeTextual.OCRFast 1
ParseMode.OCRHighQuality / ParseModeTextual.OCRHighQuality 2
ParseModeTextual.DigitalExported 3(仅 textual

注意:chartexpressionmolecule 目前无高质量模式。

表格模式:fast 与 hq

  • table=1(fast)返回的是“结构模板 + 单元格文本”的组合数据,通常需要结合 structureplaceholderscontents 还原表格:structure 描述表格骨架,placeholders 标记单元格占位,contents 提供对应的单元格文本。
  • table=2(hq)的 structure 中通常已经直接包含完整表格数据,例如带文本内容的 HTML / Markdown 表格;使用时一般不需要再额外通过 placeholderscontents 拼回单元格内容。
  • 因此,fast 更适合需要自行处理表格骨架和单元格文本映射的场景;hq 更适合直接消费完整表格结构的场景。高质量模式耗时通常更长;如果高质量服务不可用,系统会回退到可用的快速表格解析服务。
  • 表格模式只影响表格区域的结构识别与单元格内容识别。表题、表注以及表格外正文仍主要受 textual 配置影响;如果希望表格内文字和正文都尽量高质量,通常同时设置 table=2textual=2

输出配置

解析任务的 token 用于后续结果获取。通过 /get-formatted 获取格式化结果时,可以用输出开关控制返回内容:

注意:结果获取接口使用 JSON body。调用 UniParser-API 时请设置 Content-Type: application/json;未传字段会使用默认值。

开关 默认值 返回字段类型 说明
content true str 返回格式化后的全文内容,适合 LLM、搜索和阅读场景
objects false list[dict] 返回扁平语义块列表;每项包含 classconfidencefloat_xyxypagestr 等字段,适合需要坐标、类型和块级内容的场景
pages_dict false list[list[dict]] 返回按页组织的扁平布局;每一页是一个布局块列表,适合页级结构化处理
pages_tree false list[list[dict]] 返回按页组织的树形布局;保留父子层级关系,适合需要层级结构的场景
molecule_source false 控制选项 开启后在结构化结果中保留分子原始源信息;默认关闭是为了减少响应体积,关闭时会清空分子块的 source
marginalia true 控制选项 控制格式化结果中是否保留页边/边栏信息;默认保留是为了尽量不丢失原文信息

输出格式 FormatFlag

/get-formatted 可以为各语义元素指定输出格式:textualtablemoleculechartfigureexpressionequation

阶段提示:FormatFlag 属于获取结果阶段,只控制返回文本的格式;不要在 trigger 请求或 trigger_file 中使用它。

取值 名称 说明 示例
plain Plain 纯文本,无前后缀 ATP inhibits ATR.
markup Markup 标记文本,默认值 \begin{paragraph}ATP inhibits ATR.\end{paragraph}
markdown Markdown Markdown 格式,适合 LLM / 文档 ## Results- IC50: 0.5 nM
latex Latex LaTeX,特别适合公式 $E = mc^2$
html Html HTML,特别适合表格 <table>...</table>

objectspages_dictpages_tree 这三类结构化返回不会受 FormatFlag 影响;格式化主要作用于 contentobjects 中的文本字段。

常见版面类型 LayoutType

objectspages_dictpages_tree 中,每个解析块都会带有 type 字段,用于描述该块在文档版面中的语义类型。

类型 含义 通常归属的大类
documenttitle 文档标题 textual
title 章节标题 textual
paragraph 普通段落正文 textual
abstract 摘要 textual
reference 参考文献 textual
table 表格主体 table
tablecaption / tablefootnote 表题 / 表注 textual,语义上附属于表格
equation / equationinline 独立公式 / 行内公式 equation
molecule / moleculeid 分子结构 / 分子编号 molecule 或相关文本
figure / image 图片、插图、照片或图像区域 figure
figurecaption / imagecaption 图题 / 图片标题 textual,语义上附属于图片
expression / expressioncaption 化学反应式 / 反应式标题 expression 或相关文本
chart / legend 图表 / 图例 chart 或相关文本
pageheader / pagefooter / pagenumber 页眉、页脚、页码 页边信息
group / tablegroup / figuregroup / moleculegroup 组合节点 结构节点,常见于 pages_tree

type 是结果标签,不等同于提交任务时的 textualtablemolecule 等解析配置项。

API 与前端限制

UniParser 会同时检查 API 账号额度和前端上传限制。API 限制由账号类型、权限和申请配置决定;前端限制只影响网页上传,不代表 API 账号的最终可解析额度。

限制类型 适用对象 默认限制 说明
API 文件体积 / 页数 访客标准账号 200 MB,2000 页 超出后会返回文件大小或页数限制相关错误
API 文件体积 / 页数 付费用户 / 已开通高额度账号 默认 300 MB,6000 页 具体额度以 API 申请文档或账号开通配置为准,可按需申请调整
前端上传体积 网页端上传 300 MB 仅限制前端页面可上传的文件大小;直接调用 API 时仍以账号 API 限制为准

请求频率限制按账号、接口和服务端配置生效。触发限速时接口会返回 429,错误信息通常包含当前命中的限制;批量任务建议使用异步提交、控制并发,并在重试时增加退避等待。如需更高请求频率或更高文件额度,请在申请 API 权限时一并说明。

错误码清单 ErrorFlag

类别 常见错误 说明 处理建议
请求参数错误 400 Token_Required:缺少 token 参数
Token_Invalidtoken 含非法字符,只允许字母、数字、下划线和短横线
Token_Duplicatedtoken 已存在且任务未完成
File_Required:缺少上传文件
File_Invalid:PDF 无效或页数为 0
URL_Invalid:URL 不是有效 PDF 或下载失败
Request_Invalid:请求参数格式或取值非法
必填参数、文件内容、URL 可访问性或字段取值不符合要求 补齐必填字段;为原生 API 重新生成唯一 token;确认文件可打开、URL 可下载,并按文档取值传参
解析状态错误 400 Status_Not_Found:找不到该 token 对应的状态文件
Status_Decode_Failed:状态 JSON 损坏或无法解析
Result_Not_Found:解析结果文件不存在
Result_Decode_Failed:结果 JSON 损坏或无法解析
通常说明任务不存在、结果已清理、任务失败,或服务端缓存文件异常 确认使用的是提交时返回的同一个 token;异步任务先等待完成再获取;长期缺失时重新提交任务
语言和页面错误 400 Lang_Unknown:无法判断文档语言
Lang_Not_Supported:当前语言不支持对应 OCR / 解析流程
Pages_Required:特定接口缺少页码选择
文档语言、扫描质量或页码选择不满足解析要求 检查文档是否可读;必要时指定页码或换用支持语言的文档;扫描件建议提升图像质量
用户限制错误 400 File_Size_Exceeded:文件大小超过账号权限
PDF_Pages_Exceeded:PDF 页数超过账号权限
File_Type_Not_Allowed:文件类型不在允许列表中
Domain_Not_Allowed:URL 下载域名不在允许列表中
和账号权限、文件大小、页数、文件类型、下载域名白名单有关 压缩或拆分文件;换用允许的文件类型或域名;如需更高额度或权限,请联系我们开通
系统和处理错误 400/500 OS_Error:文件 IO、路径、权限或系统资源错误
Process_Failed:解析子任务执行失败,可能由模型、输入文件或运行环境导致
文件读写、模型服务或任务进程异常 保留 token 和错误响应,稍后重试;若持续失败,请携带文件、token 和错误信息反馈
认证和授权错误 401/402/403 Authentication required:缺少 AuthorizationX-API-Key
Authentication failed:JWT Token 或 API-Key 无效 / 过期
Insufficient balance:账户余额不足
Administrator privileges required:接口需要管理员权限
Missing permission:账号缺少指定解析权限
请求头、API-Key、余额或权限不满足要求 检查 X-API-Key 是否正确、未过期且有余额;需要管理员或特殊解析权限时联系我们开通
速率限制错误 429 Rate limit exceeded: <limit>:超出全局或接口限速
Rate limit exceeded, <user_rate_limit>:超出用户级限速
请求频率或并发超过限制 降低并发和重试频率,增加退避等待;批处理请排队提交,必要时申请更高限速

User Authentication

UniParser-API uses the X-API-Key request header. You can register a visitor account on the UniParser homepage to get a basic API-Key. For higher quotas, broader permissions, or a long-term account, please contact us.

curl -X POST "https://uniparser.dp.tech/trigger-file-async" \
  -H "X-API-Key: your_api_key" \
  -F "token=example_001" \
  -F "file=@example.pdf" \
  -F "sync=true"

Store the key in an environment variable such as UNIPARSER_API_KEY instead of hard-coding it in source code.

Request Format

Endpoint Request format Notes
/trigger-file-async multipart/form-data Upload a PDF in the file field; other parameters are also form fields
/trigger-snip-async / /trigger-url-async JSON body Image snippet or URL parsing requests
/get-formatted JSON body Formatted result-fetching request

Note: every trigger endpoint requires a token. The token is the unique index for a parsing task, similar to a UUID. UniParser-Tools generates a unique token automatically; when using curl or requests with UniParser-API, you must provide it manually and ensure uniqueness. UUIDs are recommended.

sync controls how the trigger request waits: sync=false is the default asynchronous mode and returns task status plus token immediately, so you fetch results later with the same token; sync=true waits in the submit request until parsing finishes or times out, which is convenient for examples, debugging, and small files. For large files or batch processing, prefer async mode and set timeout according to task size (default: 1800 seconds).

Parsing Options SemanticType

When submitting a parsing task, you can control these semantic element fields:

Stage note: SemanticType and ParseMode / ParseModeTextual belong to the submit stage. They decide what to parse and at what quality; do not use them in /get-formatted or get_formatted.

Field Meaning Available modes
textual Prose text, headings, footnotes, etc. 0 / 1 / 2 / 3
equation Display mathematical equations 0 / 1 / 2 / -1
table Tables 0 / 1 / 2 / -1
chart Charts 0 / 1 / -1
figure Figures, illustrations, photos 0 / 1 / -1
expression Chemical reaction expressions 0 / 1 / -1
molecule Chemical molecular structures 0 / 1 / -1

Value notes: 0 disables parsing, 1 uses fast parsing, 2 uses high-quality parsing, and -1 returns the original image as Base64. textual also supports 3 for direct text extraction from digital PDFs.

Enum name Value
ParseMode.DumpBase64 / ParseModeTextual.DumpBase64 -1
ParseMode.OCRFast / ParseModeTextual.OCRFast 1
ParseMode.OCRHighQuality / ParseModeTextual.OCRHighQuality 2
ParseModeTextual.DigitalExported 3 (textual only)

Note: chart, expression, and molecule currently do not have a high-quality mode.

Table Mode: fast vs hq

  • table=1 (fast) returns a combination of a structure template and cell text. You usually need structure, placeholders, and contents together to reconstruct the table: structure describes the table skeleton, placeholders mark cell placeholders, and contents provides the corresponding cell text.
  • table=2 (hq) usually puts the complete table data directly in structure, such as an HTML / Markdown table with text content. In this mode, you generally do not need to stitch cell content back from placeholders and contents.
  • Therefore, fast is better when you want to handle the mapping between table skeleton and cell text yourself; hq is better when you want to consume a complete table structure directly. High-quality mode usually takes longer; if the high-quality service is unavailable, UniParser falls back to an available fast table parser.
  • Table mode only affects table-region structure recognition and cell-content recognition. Table captions, table footnotes, and surrounding body text are mainly controlled by textual; when you want both table text and body text to use higher quality, usually set both table=2 and textual=2.

Output Switches

The task token is used for subsequent result fetching. Use /get-formatted with output switches to choose what to fetch:

Note: result-fetching endpoints use a JSON body. When calling UniParser-API directly, set Content-Type: application/json; omitted fields use default values.

Switch Default Returned field type Description
content true str Formatted full-document content, useful for LLMs, search, and reading
objects false list[dict] Flat semantic block list. Each item includes fields such as class, confidence, float_xyxy, page, and str; useful when coordinates, block types, and block-level content are needed
pages_dict false list[list[dict]] Flattened page-level layout. Each page is a list of layout blocks, useful for page-level structured processing
pages_tree false list[list[dict]] Page-level tree layout that preserves parent-child hierarchy, useful when hierarchical structure is needed
molecule_source false Control option Keeps original molecule source information in structured outputs. It defaults to disabled to reduce response size; when disabled, molecule block source is cleared
marginalia true Control option Controls whether marginal / side-bar information is kept in formatted output. It defaults to enabled to avoid dropping source-document information

Output FormatFlag

/get-formatted can specify output formats for these semantic element fields: textual, table, molecule, chart, figure, expression, and equation.

Stage note: FormatFlag belongs to the result-fetching stage. It only controls the format of returned text; do not use it in trigger requests or trigger_file.

Value Name Meaning Example
plain Plain Plain text, no markers ATP inhibits ATR.
markup Markup Marker-wrapped text, default \begin{paragraph}ATP inhibits ATR.\end{paragraph}
markdown Markdown Markdown, suitable for LLMs and documents ## Results, - IC50: 0.5 nM
latex Latex LaTeX, especially useful for equations $E = mc^2$
html Html HTML, especially useful for tables <table>...</table>

Structured outputs such as objects, pages_dict, and pages_tree are not affected by FormatFlag; formatting mainly applies to text fields in content and objects.

Common Layout Types LayoutType

In objects, pages_dict, and pages_tree, each parsed block has a type field describing what the block represents on the page.

Type Meaning Usually belongs to
documenttitle Document title textual
title Section heading textual
paragraph Main body paragraph textual
abstract Abstract textual
reference Reference entry textual
table Table body table
tablecaption / tablefootnote Table caption / footnote textual, semantically attached to a table
equation / equationinline Display / inline equation equation
molecule / moleculeid Molecular structure / molecule label molecule or related text
figure / image Figure, illustration, photo, or image region figure
figurecaption / imagecaption Figure / image caption textual, semantically attached to an image
expression / expressioncaption Chemical reaction expression / caption expression or related text
chart / legend Chart / legend chart or related text
pageheader / pagefooter / pagenumber Header, footer, page number Marginal information
group / tablegroup / figuregroup / moleculegroup Group node Structural node, common in pages_tree

type is a result label. It is not the same layer as parsing options such as textual, table, or molecule.

API And Frontend Limits

UniParser checks both API account quotas and frontend upload limits. API limits depend on account type, permissions, and approved quota configuration; frontend limits only apply to uploads through the web page and do not represent the final parsing quota for direct API calls.

Limit type Applies to Default limit Notes
API file size / page count Visitor standard accounts 200 MB, 2000 pages Requests beyond the quota return file-size or page-count limit errors
API file size / page count Paid users / accounts with higher quotas Default 300 MB, 6000 pages The actual quota follows the API application document or account provisioning configuration and can be adjusted on request
Frontend upload size Web upload page 300 MB This only limits files uploaded from the frontend; direct API calls are still governed by the account API quota

Request-rate limits are enforced by account, endpoint, and server-side configuration. When a request is rate-limited, the API returns 429, usually with the specific limit that was hit. For batch jobs, prefer asynchronous submission, control concurrency, and use backoff before retrying. If you need higher request throughput or larger file quotas, include that requirement in the API permission request.

Error Codes ErrorFlag

Category Common errors Meaning What to do
Request parameter errors 400 Token_Required: missing token parameter
Token_Invalid: token contains invalid characters; only letters, numbers, underscores, and hyphens are allowed
Token_Duplicated: token already exists and the task is not finished
File_Required: upload file is missing
File_Invalid: invalid PDF or zero-page PDF
URL_Invalid: URL is not a valid PDF or download failed
Request_Invalid: request parameter format or value is invalid
Required parameters, file content, URL accessibility, or field values are invalid Fill in required fields; generate a new unique token for native API calls; verify the file opens, the URL downloads, and field values match the docs
Parsing status errors 400 Status_Not_Found: no status file found for the token
Status_Decode_Failed: status JSON is corrupted or cannot be decoded
Result_Not_Found: parsing result file does not exist
Result_Decode_Failed: result JSON is corrupted or cannot be decoded
Usually means the task does not exist, the result was cleaned up, the task failed, or server-side cache files are abnormal Make sure you use the same token returned at submission; wait for async tasks to finish before fetching; resubmit if the result is no longer available
Language and page errors 400 Lang_Unknown: document language cannot be detected
Lang_Not_Supported: the language is not supported by the OCR / parsing flow
Pages_Required: page selection is missing for endpoints that require it
Document language, scan quality, or page selection does not meet parsing requirements Check that the document is readable; provide pages when required; improve scan quality or use a supported-language document
User limit errors 400 File_Size_Exceeded: file size exceeds account limits
PDF_Pages_Exceeded: PDF page count exceeds account limits
File_Type_Not_Allowed: file type is not allowed
Domain_Not_Allowed: URL download domain is not allowed
Related to account permissions, file size, page count, file type, or domain allowlist Compress or split the file; use an allowed file type or domain; contact us for higher quotas or broader permissions
System and processing errors 400/500 OS_Error: file I/O, path, permission, or system resource error
Process_Failed: parsing subprocess failed, possibly caused by the model, input file, or runtime environment
File I/O, model service, or task process failed Keep the token and error response, retry later, and report the file, token, and error details if it keeps failing
Authentication and authorization 401/402/403 Authentication required: missing Authorization or X-API-Key header
Authentication failed: JWT Token or API-Key is invalid / expired
Insufficient balance: account balance is not enough
Administrator privileges required: endpoint requires admin privileges
Missing permission: account lacks specific parsing permissions
Headers, API-Key, balance, or permissions are insufficient Check that X-API-Key is correct, unexpired, and has balance; contact us for administrator or special parsing permissions
Rate limit errors 429 Rate limit exceeded: <limit>: global or endpoint limit exceeded
Rate limit exceeded, <user_rate_limit>: user-level rate limit exceeded
Request frequency or concurrency exceeds limits Reduce concurrency and retry frequency, add backoff, queue batch jobs, or request a higher rate limit
V.

常见问题

解答调用方式、任务状态、限制额度和结果字段等常见疑问。

token 是什么,为什么必须提供唯一 token

token 是解析任务的唯一索引,用于查询状态和获取结果。原生 API 调用时需要自行保证唯一性,推荐使用 UUID;如果复用未完成任务的 token,可能返回 Token_Duplicated

应该使用 UniParser-Tools 还是 UniParser-API?

优先使用 UniParser-Tools。它会自动处理 token 生成、任务提交、轮询和结果获取,适合脚本、批处理和 notebook。只有在需要直接控制 HTTP 请求、鉴权、回调、并发或字段开关时,才建议使用原生 UniParser-API。

sync=truesync=false 怎么选?

sync=true 会在提交请求中等待解析完成,适合小文件、调试和示例。大文件、批处理或高并发任务建议使用默认异步模式 sync=false,提交后保存返回的 token,稍后调用结果接口获取内容。

文件超过限制怎么办?

访客标准账号默认限制为 200 MB、2000 页;付费用户或已开通高额度的账号默认限制为 300 MB、6000 页,具体以 API 申请和账号配置为准。超过限制时可以压缩或拆分 PDF,或申请更高额度。

触发 429 限速怎么办?

当前 rate limit 使用近似滑动窗口机制。触发限速时接口会返回 429,表示当前用户、接口或全局入口的请求频率超过了服务端配置。

逐用户生效的限速主要覆盖两类 API:

  • 解析请求 API:例如提交 PDF、URL、截图/图片解析的接口,会按照账号配置中的速率 S 限速。
  • 结果获取 API:例如获取状态、格式化结果、段落或视图结果的接口,也会逐用户限速;这类接口的限速通常比解析提交接口宽松很多,便于轮询和批量取结果。

另外,部分管理类、信息获取类或公共入口 API 会有全局限速。全局限速不区分用户,主要用于避免异常流量导致系统后端过载。

客户端侧建议使用令牌桶控制并发请求:令牌以速率 R 放入桶中,桶容量为 B;每次请求先消耗一个令牌,没有令牌时排队或等待。这样可以平滑批量请求,避免瞬时并发打满服务端限速。

        令牌生成器(速率 R)
              |
              v   (不断放入令牌)
        +-------------+
        | 令牌桶       |  最大容量 B
        | o o o       |
        +-------------+
              |
              v   (请求到达,消耗令牌)
        [ 待发送请求 ] --------> [ 允许通过 ]

如果仍然频繁触发 429,请降低提交频率和并发数,并在重试时增加退避等待。批量任务建议排队异步提交;如果业务需要更高吞吐,可以申请更高请求频率限制。

contentobjectspages_dictpages_tree 有什么区别?

content 返回格式化后的全文,适合阅读、检索和 LLM 输入。objects 返回扁平语义块列表,适合按块处理坐标、类型和文本。pages_dict 按页返回扁平布局。pages_tree 按页返回树形结构,适合需要父子层级关系的场景。

表格结果应该看哪个字段?

如果只需要可读文本,通常直接获取 content,并把 table 输出格式设为 markdownhtml。如果需要结构化表格对象,请查看 objectspages_dictpages_tree 中的表格块。table=1(fast)通常需要结合 structureplaceholderscontents 还原表格;table=2(hq)通常在 structure 中直接包含完整表格数据。

textualtable 都要设置吗?

需要根据目标分别设置。table 控制表格区域的结构识别和单元格内容识别;表题、表注、正文段落等主要受 textual 控制。如果希望正文和表格都尽量高质量,通常同时设置 textual=2table=2

What is token, and why must it be unique?

token is the unique index for a parsing task and is used to query status and fetch results. Native API callers must ensure uniqueness, preferably with UUIDs. Reusing a token for an unfinished task may return Token_Duplicated.

Should I use UniParser-Tools or UniParser-API?

Use UniParser-Tools by default. It handles token generation, task submission, polling, and result fetching for you, which is suitable for scripts, batch jobs, and notebooks. Use the native UniParser-API only when you need direct control over HTTP requests, authentication, callbacks, concurrency, or field switches.

How should I choose between sync=true and sync=false?

sync=true waits for parsing to finish in the submit request. It is useful for small files, debugging, and examples. For large files, batch jobs, or high-concurrency workloads, use the default async mode sync=false, store the returned token, and fetch the result later.

What should I do if my file exceeds the limit?

Visitor standard accounts are limited to 200 MB and 2000 pages by default. Paid users or accounts with higher quotas default to 300 MB and 6000 pages, with the actual quota determined by the API application and account provisioning configuration. Compress or split the PDF, or request a higher quota.

What should I do after a 429 rate-limit response?

The current rate limit is implemented as an approximate sliding-window mechanism. A 429 response means the request rate has exceeded the server-side limit for the current user, endpoint, or global entry point.

User-level rate limits mainly apply to two groups of APIs:

  • Parsing request APIs: endpoints that submit PDF, URL, snip, or image parsing tasks. These are limited according to the account rate S.
  • Result retrieval APIs: endpoints that fetch task status, formatted results, paragraphs, or view data. These are also limited per user, but their limits are usually much more tolerant than the parsing submission rate so clients can poll and fetch batch results.

Some management, information retrieval, or public-entry APIs also have global rate limits. Global limits are not user-specific; they protect the backend from overload caused by abnormal traffic.

For client-side concurrency control, we recommend using a token-bucket algorithm. Tokens are added to the bucket at rate R, and the bucket capacity is B. Each request consumes one token before it is sent; if no token is available, the request should wait or remain queued. This smooths bursty batch traffic and avoids hitting server-side rate limits.

        Token generator (rate R)
              |
              v   (adds tokens continuously)
        +-------------+
        | Token bucket |  max capacity B
        | o o o       |
        +-------------+
              |
              v   (incoming request consumes a token)
        [ Pending request ] --------> [ Allowed ]

If you still hit 429 frequently, reduce submission frequency and concurrency, and add backoff before retrying. For batch jobs, queue tasks and submit them asynchronously. If your business requires higher throughput, request a higher rate limit.

What is the difference between content, objects, pages_dict, and pages_tree?

content returns formatted full-document text, suitable for reading, search, and LLM input. objects returns a flat semantic block list for block-level coordinates, types, and text. pages_dict returns flattened page-level layouts. pages_tree returns page-level trees when parent-child hierarchy is needed.

Which field should I use for table results?

If you only need readable text, fetch content and set the table output format to markdown or html. If you need structured table objects, inspect table blocks in objects, pages_dict, or pages_tree. With table=1 (fast), you usually reconstruct tables from structure, placeholders, and contents; with table=2 (hq), structure usually contains the complete table data directly.

Should I set both textual and table?

Set them according to what you need. table controls table-region structure recognition and cell-content recognition; table captions, table footnotes, and body paragraphs are mainly controlled by textual. If you want both body text and tables to use higher quality, usually set both textual=2 and table=2.