Project Url: amplafi/htmlcleaner
Introduction: An active fork of
More: Author   ReportBugs   

NOTE: This fork of htmlcleaner is now merged back into the project as of version 2.4

2.4 is officially released!

This fork is kept only to help with patch submission to the official version.


  • omitHtmlEnvelope behavior change:

    • output all the html contained in the body not just first TagNode contents. ( useful for cleaning html fragments ) ( creates a new blank TagNode to hold the nodes to be outputed
    • omitHtmlEnvelope also triggers omitDoctype
  • TagNodes that can be reopened after their parent is closed ( i.e. -- would result in ) if the reopened tag ( in this example ) is immediately closed, the reopened tag is pruned. -- accomplished by checking the autoGenerated boolean on TagNode )

  • refactoring template methods from Utils to TagTransformer.

*CleanerTransformations changes:

  • Utils.updateTagTransformations now member function.
  • Handles the transformation work so that multiple TagTransformations can be applied to a given tag. ( sets up for regex expression matching )
  • now owns responsibility for determining transformed tagname. *concept of global AttributeTransformations -- used to strip all attributes that start with "on" for example ( i.e. "onclick" , "onblur" )
  • plus added regular expressions matching on values/attribute names

XmlSerializer/HtmlCleaner -- remove IOException being thrown when reading from strings.

  • work on spotting "tricky" encoding -- unencode normal ascii characters.

    • get Default Output charset from CleanerProperties

    • handle badly encoded numbers better for example &x0fx , &0A; were parsed badly before

    • added a bunch of html special entities

    • convert ' in html context to '

    • added regex attribute/value matching

    • random spelling corrections

    • additional documentation
  • add greek and math symbols

  • cleanup change - if tag was closed due to improperly placed child it will be reopened after the child. See for examples

  • added audit code - now it is possible to hook in code that will be notified about changes that htmlcleaner does. See CleanerProperties#addHtmlModificationListener.

  • Added unit tests for escapeXml function from Utils

  • JDom generation updated not to fail on starting with 'xml' attributes.

  • Unit tests TODOs added

Support Me
About Me
Google+: Trinea trinea
GitHub: Trinea