Skip to content

Public API

The top-level docwow module exposes six functions covering the most common workflows.

docwow.open(source)

Parse a DOCX file or a docwow HTML string into a :class:~docwow.api.document.DocumentWrapper.

Parameters:

Name Type Description Default
source str | Path | bytes

A file path (str or :class:~pathlib.Path) or raw bytes pointing to a .docx file, or an HTML string produced by :func:render_document.

required

Returns:

Name Type Description
A 'DocumentWrapper'

class:~docwow.api.document.DocumentWrapper instance.

docwow.to_html(source, page_view=False)

Convert a DOCX file to a self-contained HTML string.

Parameters:

Name Type Description Default
source str | Path | bytes

Path to a .docx file, or raw DOCX bytes.

required
page_view bool

When True, styles the output as a physical page and adds @media print / @page rules for correct browser printing and PDF export.

False

Returns:

Type Description
str

UTF-8 HTML string produced by :func:render_document.

docwow.to_docx(html, target=None, *, is_foreign_html=False, fetch_images=False, fetch_external_css=False)

Convert an HTML string to a DOCX file.

Parameters:

Name Type Description Default
html str | bytes

HTML string or bytes.

required
target str | Path | None

Optional output path. When provided the bytes are also written to disk.

None
is_foreign_html bool

Set to True to convert arbitrary HTML from any source (CMS, rich text editor, web page, etc.). When False (default), the HTML must have been produced by docwow — passing foreign HTML without this flag raises :exc:ValueError.

False
fetch_images bool

When True, remote <img src="https://..."> URLs are downloaded and embedded. Default False — remote images are skipped with a :class:~docwow.DocwowConversionWarning. Only used when is_foreign_html=True.

False
fetch_external_css bool

When True, <link rel="stylesheet"> URLs are downloaded and applied. Default False — external stylesheets are ignored with a warning. Only used when is_foreign_html=True.

False

Returns:

Type Description
bytes

Raw DOCX bytes (a valid ZIP archive).

Raises:

Type Description
ValueError

If is_foreign_html=False and the HTML does not appear to be docwow output (no dw-document element found).

docwow.parse_docx(source)

Parse a DOCX file and return a Document.

Parameters:

Name Type Description Default
source str | Path | bytes

Path to a .docx file (str or Path), or raw bytes of the zip archive (useful in tests and web upload handlers).

required

Returns:

Type Description
Document

A fully populated Document ready for rendering.

docwow.parse_html(source)

Parse a docwow HTML string back into a Document model.

Parameters:

Name Type Description Default
source str | bytes

HTML produced by render_document(), as a string or UTF-8 bytes.

required

Returns:

Name Type Description
A Document

class:~docwow.models.document.Document whose body, geometry,

Document

styles, and numbering reflect the content of the HTML.

Raises:

Type Description
ValueError

If the HTML does not contain a dw-document element.

docwow.render_document(doc, embed_images=True, page_view=False)

Render a Document to a complete, self-contained HTML string.

Parameters:

Name Type Description Default
doc Document

The document model to render.

required
embed_images bool

When True (default), images are embedded as base64 data URIs. When False, a placeholder src is used (useful for testing without large base64 blobs).

True
page_view bool

When True, adds CSS that styles the document as a physical page (gray background, drop shadow) and injects an @media print block with @page size/margin rules so the browser paginates correctly when printing or exporting to PDF.

False

Returns:

Type Description
str

A UTF-8 HTML string starting with <!DOCTYPE html>.

docwow.write_docx(doc, target=None)

Write a Document to a DOCX byte string.

Parameters:

Name Type Description Default
doc Document

The document model to serialise.

required
target str | Path | None

Optional file path. When provided the bytes are also written to disk.

None

Returns:

Type Description
bytes

The raw DOCX bytes (a valid ZIP archive).