While trawling the depths of mailing lists this evening, I stumbled upon this new release:
kses is an HTML/XHTML filter written in PHP. It removes all unwanted HTML elements and attributes, and it also does several checks on attribute values. kses can be used to avoid Cross-Site Scripting (XSS), Buffer Overflows and Denial of Service attacks.
The current version (0.2.1) is mostly a bugfix release. This looks like a good package to wrangle mangled HTML/XHTML and should be helpful in avoiding those pesky HTML/XHTML script kiddies.