htmlcleaner
NOTE: This fork of htmlcleaner is now merged back into the http://htmlcleaner.sourceforge.net/ project as of version 2.4
2.4 is officially released!
This fork is kept only to help with patch submission to the official version.
==========================================================================
omitHtmlEnvelope behavior change:
- output all the html contained in the body not just first TagNode contents. ( useful for cleaning html fragments ) ( creates a new blank TagNode to hold the nodes to be outputed
- omitHtmlEnvelope also triggers omitDoctype
TagNodes that can be reopened after their parent is closed ( i.e. -- would result in ) if the reopened tag ( in this example ) is immediately closed, the reopened tag is pruned. -- accomplished by checking the autoGenerated boolean on TagNode )
refactoring template methods from Utils to TagTransformer.
*CleanerTransformations changes:
- Utils.updateTagTransformations now member function.
- Handles the transformation work so that multiple TagTransformations can be applied to a given tag. ( sets up for regex expression matching )
- now owns responsibility for determining transformed tagname. *concept of global AttributeTransformations -- used to strip all attributes that start with "on" for example ( i.e. "onclick" , "onblur" )
- plus added regular expressions matching on values/attribute names
XmlSerializer/HtmlCleaner -- remove IOException being thrown when reading from strings.
work on spotting "tricky" encoding -- unencode normal ascii characters.
get Default Output charset from CleanerProperties
handle badly encoded numbers better for example &x0fx , &0A; were parsed badly before
added a bunch of html special entities
convert ' in html context to '
added regex attribute/value matching
random spelling corrections
- additional documentation
add greek and math symbols
cleanup change - if tag was closed due to improperly placed child it will be reopened after the child. See ClosedTagReopenTest.java for examples
added audit code - now it is possible to hook in code that will be notified about changes that htmlcleaner does. See CleanerProperties#addHtmlModificationListener.
Added unit tests for escapeXml function from Utils
JDom generation updated not to fail on starting with 'xml' attributes.
Unit tests TODOs added