Parser
Converts DOCX XML into the Document model.
docwow.parser.docx_parser.parse_docx(source)
Parse a DOCX file and return a Document.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str | Path | bytes
|
Path to a .docx file (str or Path), or raw bytes of the zip archive (useful in tests and web upload handlers). |
required |
Returns:
| Type | Description |
|---|---|
Document
|
A fully populated Document ready for rendering. |
docwow.parser.body_parser.parse_body(body, zf, relationships, style_num_map=None)
Parse
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
style_num_map
|
dict[str, tuple[str, int]] | None
|
Mapping of style_id → (num_id, ilvl) for styles that embed their numbering definition. Built by style_parser.parse_style_numbering() and used to resolve list membership when a paragraph's own pPr has no w:numPr (the common python-docx / Word pattern). |
None
|
docwow.parser.style_parser.parse_styles(styles_xml)
Parse the raw bytes of word/styles.xml and return all Style objects.
docwow.parser.numbering_parser.parse_numbering(numbering_xml)
Parse word/numbering.xml and return one NumberingDefinition per num.
docwow.parser.image_parser.extract_image(zf, relationship_id, relationships, cx_emu, cy_emu, alt_text='')
Build an InlineImage from a zip file and relationship metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
zf
|
ZipFile
|
Open DOCX zip archive. |
required |
relationship_id
|
str
|
The rId from the drawing XML (e.g. "rId5"). |
required |
relationships
|
dict[str, str]
|
Mapping from parse_relationships(). |
required |
cx_emu
|
int
|
Image width in EMU (from wp:extent/@cx). |
required |
cy_emu
|
int
|
Image height in EMU (from wp:extent/@cy). |
required |
alt_text
|
str
|
Accessibility description (may be empty). |
''
|
Returns:
| Type | Description |
|---|---|
InlineImage | None
|
InlineImage, or None if the relationship target can't be found in |
InlineImage | None
|
the zip (e.g. linked rather than embedded image). |