Skip to content

Parser

Converts DOCX XML into the Document model.

docwow.parser.docx_parser.parse_docx(source)

Parse a DOCX file and return a Document.

Parameters:

Name Type Description Default
source str | Path | bytes

Path to a .docx file (str or Path), or raw bytes of the zip archive (useful in tests and web upload handlers).

required

Returns:

Type Description
Document

A fully populated Document ready for rendering.

docwow.parser.body_parser.parse_body(body, zf, relationships, style_num_map=None)

Parse and return a tuple of BodyElement (Paragraph | Table).

Parameters:

Name Type Description Default
style_num_map dict[str, tuple[str, int]] | None

Mapping of style_id → (num_id, ilvl) for styles that embed their numbering definition. Built by style_parser.parse_style_numbering() and used to resolve list membership when a paragraph's own pPr has no w:numPr (the common python-docx / Word pattern).

None

docwow.parser.style_parser.parse_styles(styles_xml)

Parse the raw bytes of word/styles.xml and return all Style objects.

docwow.parser.numbering_parser.parse_numbering(numbering_xml)

Parse word/numbering.xml and return one NumberingDefinition per num.

docwow.parser.image_parser.extract_image(zf, relationship_id, relationships, cx_emu, cy_emu, alt_text='')

Build an InlineImage from a zip file and relationship metadata.

Parameters:

Name Type Description Default
zf ZipFile

Open DOCX zip archive.

required
relationship_id str

The rId from the drawing XML (e.g. "rId5").

required
relationships dict[str, str]

Mapping from parse_relationships().

required
cx_emu int

Image width in EMU (from wp:extent/@cx).

required
cy_emu int

Image height in EMU (from wp:extent/@cy).

required
alt_text str

Accessibility description (may be empty).

''

Returns:

Type Description
InlineImage | None

InlineImage, or None if the relationship target can't be found in

InlineImage | None

the zip (e.g. linked rather than embedded image).