Bug in SgmlReader

Update 20 April 2010

SgmlReader 1.8.6 has this problem fixed.

Original post

Chris Lovett of Microsoft wrote SgmlReader 1.7 and has kindly shared it with the world. What does it do? In his own words:

An XmlReader implementation for loading SGML (including HTML) converting it to well formed XML, by adding missing quotes, empty attribute values, ignoring duplicate attributes, case folding on tag names, adding missing closing tags based on SGML DTD information, and so on.

It derives from XmlReader so you could pass an SgmlReader object to any class that consumes XmlReader, e.g., XPathDocument.

For my current project at the office, the SgmlReader is used to parse HTML, and an XSL template is then applied to the generated XML. Everything was ok during smoke testing but we got thrown an intermittent System.NullReferenceException (”Object reference not set to an instance of an object”) exception in QA. After some hours I found out that it’s due to boolean HTML attributes in minimized form, e.g. noshade in the following:

<HR width="100%" color=#000000 noShade SIZE=1>

Original code in SgmlReader.cs:

public override string Value {
	get {
		if (this.state == State.Attr || this.state == State.AttrValue) {
			return this.a.Value;
		}
		return this.node.Value;
	}
}

My solution (explanation in comments):

public override string Value {
	get {
		if (this.state == State.Attr || this.state == State.AttrValue) {
			// if this.a.Value is null, a NullReferenceException will be thrown
			// this.a.Value will be null if HTML boolean attributes appear in minimized form
			// full list of HTML boolean attributes: compact, nowrap, ismap, declare, noshade, checked, disabled, readonly, multiple, selected, noresize, defer
			// so now we set the value to be the name if the value is null
			// in other words, we transform the minimized form to the non-minimized form.
			return (this.a.Value != null ? this.a.Value : this.a.Name);
		}
		return this.node.Value;
	}
}

26 August 2008 | .NET, C#, ASP.NET | Comments

Comments:

  1.  
  2.  
  3.