htmlcleaner

Project Url: amplafi/htmlcleaner
Introduction: An active fork of http://htmlcleaner.sourceforge.net
More: Author   ReportBugs   
Tags:

NOTE: This fork of htmlcleaner is now merged back into the http://htmlcleaner.sourceforge.net/ project as of version 2.4

2.4 is officially released!

This fork is kept only to help with patch submission to the official version.

==========================================================================

  • omitHtmlEnvelope behavior change:

    • output all the html contained in the body not just first TagNode contents. ( useful for cleaning html fragments ) ( creates a new blank TagNode to hold the nodes to be outputed
    • omitHtmlEnvelope also triggers omitDoctype
  • TagNodes that can be reopened after their parent is closed ( i.e. -- would result in ) if the reopened tag ( in this example ) is immediately closed, the reopened tag is pruned. -- accomplished by checking the autoGenerated boolean on TagNode )

  • refactoring template methods from Utils to TagTransformer.

*CleanerTransformations changes:

  • Utils.updateTagTransformations now member function.
  • Handles the transformation work so that multiple TagTransformations can be applied to a given tag. ( sets up for regex expression matching )
  • now owns responsibility for determining transformed tagname. *concept of global AttributeTransformations -- used to strip all attributes that start with "on" for example ( i.e. "onclick" , "onblur" )
  • plus added regular expressions matching on values/attribute names

XmlSerializer/HtmlCleaner -- remove IOException being thrown when reading from strings.

  • work on spotting "tricky" encoding -- unencode normal ascii characters.

    • get Default Output charset from CleanerProperties

    • handle badly encoded numbers better for example &x0fx , &0A; were parsed badly before

    • added a bunch of html special entities

    • convert ' in html context to '

    • added regex attribute/value matching

    • random spelling corrections

    • additional documentation
  • add greek and math symbols

  • cleanup change - if tag was closed due to improperly placed child it will be reopened after the child. See ClosedTagReopenTest.java for examples

  • added audit code - now it is possible to hook in code that will be notified about changes that htmlcleaner does. See CleanerProperties#addHtmlModificationListener.

  • Added unit tests for escapeXml function from Utils

  • JDom generation updated not to fail on starting with 'xml' attributes.

  • Unit tests TODOs added

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools