DITA Community Internationalization Plugin

The DITA Community internationalization plugin, org.dita-community.i18n, provides a set of features for doing locale-specific sorting, grouping, line breaking, word breaking, and rendered text length estimation.

It has been designed to support the general tasks of index generation and glossary sorting in DITA documents but can be used in any XSLT or Java program.

Word breaking is required to do correct sorting of Simplified Chinese (zh-CN) but is also useful for other tasks, such as inserting zero-width spaces to enable appropriate line breaking or determining the width of the first word of a string in order to calculate column widths or definition list term indents.

Line breaking is useful for doing line wrapping in content, for example, to estimate the horizontal and vertical rendered extent of blocks of text in order to then do pagination or allocation of text to fixed-sized regions in XSL-FO or HTML output.

A key feature of this library is full support for grouping and sorting Simplified Chinese, which is particularly challenging and cannot be correctly sorted using the default Java or ICU collators.

Simplified Chinese requires a dictionary and this plugin provides an open-source dictionary-based solution for Simplified Chinese, based on the open-source CC-CEDICT dictionary. See http://www.mdbg.net/chindict.

It provides the following services:

  • A customizable collator for use with XSLT. Includes support for dictionary-based Simplified Chinese grouping and sorting.

  • A general grouping service useful for grouping index terms, glossary entries, etc.

  • Access to the ICU word and line breaking facilities in XSLT, useful for doing word breaking in languages like Thai and Chinese that require a dictionary or writing-system-specific rules.

The plugin supports Open Toolkit version 1.8.5 and version 2.x later than about 2.1.1. It works with Saxon 9.1 (as shipped with the Open Toolkit through version 2.4).

Support for Saxon 9.6+ is under development as of release 1.0.0. Because Saxon versions after 9.1 change the way that extension functions are supported in the free version of Saxon (Saxon HE), the extension function integration for Saxon 9.1 will not work with later versions of Saxon, which require a different mechanism for integrating extension functions.