lib.sanitize

See also

Registry-aware entry: SanitizeTransform in Docutils components.

Doctree sanitization for Django Docutils web-facing rendering.

These helpers strip HTML-unsafe nodes and attributes from a docutils document before it is written to HTML. SanitizeTransform exposes the same pass as a docutils transform so it can run inside a custom writer pipeline; sanitize_doctree() is the reusable entry point.

django_docutils.lib.sanitize._C0_CONTROL_CHARS: Final[frozenset[str]] = <...truncated, 269 chars>
data
data
django_docutils.lib.sanitize._C0_CONTROL_CHARS: Final[frozenset[str]] = <...truncated, 269 chars>

C0 / DEL control characters that disqualify a URI outright.

django_docutils.lib.sanitize._uri_is_allowed(uri, allowed_uri_schemes)
function[source]
function[source]
django_docutils.lib.sanitize._uri_is_allowed(uri, allowed_uri_schemes)

Return whether a URI can be emitted into HTML attributes.

Control characters are rejected before parsing: scheme-invalid bytes such as a vertical tab make urlsplit report an empty scheme, which would otherwise pass as a relative link. URIs urlsplit refuses to parse (e.g. malformed IPv6 brackets) are treated as disallowed.

Examples

>>> _uri_is_allowed("https://example.com", frozenset({"https"}))
True
>>> _uri_is_allowed("#section", frozenset())
True
>>> _uri_is_allowed("javascript:alert(1)", frozenset({"https"}))
False
>>> _uri_is_allowed("java\x0bscript:alert(1)", frozenset({"https"}))
False
>>> _uri_is_allowed("http://[::1", frozenset({"http"}))
False
Parameters:
Return type:

bool

django_docutils.lib.sanitize._replace_node_with_text(node)
function[source]
function[source]
django_docutils.lib.sanitize._replace_node_with_text(node)

Replace a node with its rendered text content.

Examples

>>> paragraph = nodes.paragraph()
>>> reference = nodes.reference("", "", nodes.Text("link"))
>>> paragraph += reference
>>> _replace_node_with_text(reference)
>>> paragraph.astext()
'link'
Parameters:

node (Element)

Return type:

None

django_docutils.lib.sanitize._remove_node(node)
function[source]
function[source]
django_docutils.lib.sanitize._remove_node(node)

Remove a node from its parent if it is attached.

Examples

>>> paragraph = nodes.paragraph()
>>> raw = nodes.raw("", "<script></script>", format="html")
>>> paragraph += raw
>>> _remove_node(raw)
>>> len(paragraph.children)
0
Parameters:

node (Element)

Return type:

None

django_docutils.lib.sanitize.sanitize_doctree(document, docutils_settings=None)
function[source]
function[source]
django_docutils.lib.sanitize.sanitize_doctree(document, docutils_settings=None)

Remove unsafe HTML-producing nodes and attributes from a doctree.

Parameters:
  • document (docutils.nodes.document) – Doctree to sanitize in place.

  • docutils_settings (mapping, optional) – Already-resolved Docutils settings, consumed as-is. None resolves project defaults via get_docutils_settings(). raw_enabled only skips raw-node removal when the project also sets allow_unsafe_docutils_settings. URI scheme policy is project-level via get_allowed_uri_schemes(), not per-call.

Return type:

None

Examples

>>> document = nodes.document("", "")
>>> document += nodes.raw("", "<script></script>", format="html")
>>> sanitize_doctree(document)
>>> len(document.children)
0
class django_docutils.lib.sanitize.SanitizeTransform
class django_docutils.lib.sanitize.SanitizeTransform

Bases: Transform

Run sanitize_doctree() as a docutils transform.

DjangoDocutilsWriter sanitizes in translate() so the pass always runs after every transform. This transform makes the same pass available to custom docutils pipelines that do not use that writer. default_priority is high so it runs late when added to a writer’s transform list.

Examples

>>> from django_docutils.lib.publisher import publish_doctree
>>> document = publish_doctree("Hello")
>>> document += nodes.raw("", "<script></script>", format="html")
>>> SanitizeTransform(document, document).apply()
>>> list(document.findall(nodes.raw))
[]