Doing Line and Word Breaking in XSLT

The i18n library provides XSLT access to the ICU4J line and working functions.

The i18n XSLT utility library provides functions for doing locale-aware word and line breaking of strings. This can be useful for things like estimating rendered lengths of strings, generating word indexes, and finding appropriate points to insert zero-width spaces in literal examples where line wrapping would not otherwise occur.

The file test/xsl/test-collator.xsl in the i18n plugin demonstrates how to use the collator and the line and word breaking utility functions.

For example, to break a line into words you can do something like this:
<xsl:variable name="words" as="xs:string*"
  select="dci18n:splitWords(normalize-space(.), $langCode)
/>
<xsl:for-each select="$words">
  <p><span class="word"><xsl:value-of select="."/></span></p>
</xsl:for-each>

The dci18n:splitWords() function only returns words, omitting any whitespace, punctuation, or other non-word characters. The function will return at least one word unless the input string is empty or consists only of non-word characters.

The dci18n:splitLines() function works the same way, splitting the text into a sequence of strings, each string representing the string from the previous line break point to the next one (or from the start of the string to the first break point). The result reflects all the characters in the input string.