View Full Version : DOM fundamentally broken

04-07-2005, 02:08 PM
Well guys, all of you sometimes need to traverse a collection of elements. Most of the times it's div tags. You need to mark a set of divs in your page, so that later you'll be able to iterate over them. But don't restrict the problem only to divs, you should be able to iterate over p tags, table tags and so on. Sure we all wanna make our code to work everywhere or at least on every browser confirming W3 specs. So we drop right into it.
We see that every HTML element has the following properties:

interface HTMLElement : Element {
attribute DOMString id;
attribute DOMString title;
attribute DOMString lang;
attribute DOMString dir;
attribute DOMString className;

They are id, title, lang, dir, className.

The only way we could mark our elements (div, p, table, etc.) is by id (which according to the same spec should be unique), title is for tooltips, lang and dir has their specific meanings, as well as className.

Somewhere up in the spec, more precisely
we see the definition of HTMLDocument (is the root of the HTML hierarchy and holds the entire content)
So we start from the top and we wanna find our elements. What tools gives us HTMLDocument? The only methods you'll find are:

Element getElementById(in DOMString elementId);
NodeList getElementsByName(in DOMString elementName);

There aren't any other getElemntXXX, except

NodeList getElementsByTagName(in DOMString tagname);
which is defined in the more general interface Document

So we could search only by:
1. ID attribute
2. Name attribute
3. Tag name itself

1. ID doesn't do the job, because id attributes should be unique, you can't have multiple div tags with one and the same id; moreover getElementById returns just one element, we want them all
2. Name is the fundamentally broken feature. According to the reference:

Returns the (possibly empty) collection of elements whose name value is given by elementName.
elementName of type DOMString
The name attribute value for an element.

Return Value
The matching elements.

This sounds great unless you realize by the same spec as we already saw that HTMLElement does not have NAME :mad: attribute. Name attrbute have only some of the HTML elements, e.g. INPUT. So it's illegal to have a div tag with a name attribute. You can't capture divs or p (paragraph) tags by name attribute using getElementsByName(). This is by spec. IE observes it. However, Mozilla (FireFox) doesn't. You can put Name attributes everywhere and the method will select them.

3. getElementsByTagName isn't useful too. Sure you could select all of your divs but actually you want only a subset of them, marked statically ! (we even don't want to select them on a dynamic criterion - some tags are marked by the time HTML is produced, e.g. by php; we don't mark tags in javascript runtime)

So, the problem is clear, id and name attributes have a long known bad history, they frequently lead to confusion and my research in DOM made me think that actually the problem is within the W3 specification, not the browsers. Both IE and Mozilla claim to support DOM Level 1 but neither works the same.
Sure most developers (and managers) actually don't have a problem, IE shares 90% of the market and in the most cases you don't have to worry about the others. In large projects however, the solution is to write ugly ifs in order to reveal the identity of your thin client.
We could go even further and investigate the reason why (why mr anderson, why?) W3 has made this fault - why there isn't a standard way of selecting a group of elements. When I look at the names:

Mike Champion, ArborText
Vidur Apparao, Netscape
Scott Isaacs, Microsoft (until January 1998)
Chris Wilson, Microsoft (after January 1998)
Ian Jacobs, W3C

I wasn't able to find a good reason.

P.S. All of the above is concerning DOM Level 1. However, DOM Level 2 and 3 doesn't make an exception.

04-07-2005, 02:14 PM
Just define your own attribute - the browsers will parse it regardless and it is legal from the specifications point of view in XML/XHTML

<div myelement="myelement">
var alldivs = document.getElementsByTagName('div');
for(var i=0; i<alldivs.length; i++)
{ /* do something */

... that is if for some reason giving element a class is not good enough for you....

04-07-2005, 02:29 PM
Yes, but more weird may look the fact that getElementsByName(name)[index] will really work for all the browsers, despite the fact that name is not a recomended prpoerty of a HTML element. Same with the famous innerHTML which is not a w3c recomanded but all the modern browsers use and confirm it as a method.

Name remains an important property for the radio buttons at least as long as there is other HTML way of building them. This might be the reson for name was kept alng with the id. (Frankly said I don't even sense why a new property id was implemented as long as it does not seem to act much dofferent by the old name...

You should take in consideration that the stupid war between Microsoft and Netscape (in fact between Microsoft and the rest of the World :D ) made the w3c "reconciliation" process almost impossible. DOM3 keeps all the old form's element reference methods as well, even they may look deprecated. Or they are not... who knows?...

Still I don't sense your worry. You may combine id with tag name to build/traverse a separate collection


04-07-2005, 03:18 PM
Just define your own attribute - the browsers will parse it regardless and it is legal from the specifications point of view in XML/XHTML

I don't say that there isn't solution to the problem. I've said that DOM spec is broken. You can't define a global getElementsByName() but don't define "name" attribute for every HTML element.

Your solution really works but iterating in javascript over ALL elements (what document.getElementsByTagName() does) is somewhat, hm ... All and all what really people need is a way of marking thier arbitrary html element and iterate using an integrated method. Speed is imporant in large htmls, time is money remember? Nobody will wait such a solution in a html with a couple of thousands of elements.

04-07-2005, 03:23 PM
the browsers will parse it regardless and it is legal from the specifications point of view in XML/XHTML

If I extend this trick using childNodes, it woun't work

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Style-Type" content="text/css">
<meta http-equiv="Content-Script-Type" content="text/javascript">
<script language="JavaScript" type="text/JavaScript">
var allElements = document.getElementsByTagName('body')[0].childNodes;
for(var i=0;i<allElements.length;i++){
<div foo="fee">uu</div>

04-07-2005, 03:29 PM

Yes, it's also a sensible solution, but in fact I've just written down my disgust at what W3 have fabricated as HTML DOM standard.

04-07-2005, 03:46 PM
anyway.. it looks a little better that the abslutely disjunctive document.all vs document.layers reereneces war...

My problem is otherwise (well, some how related) . I feel orphan since DOM took me the possibilty to circle through all the elements of the document, nomatter the tag name.

On one side I like traversing through childNodes. On the other I hate that IE and MOZ consider childNodes length in different ways (according to the textNodes structure), so that the method is of not great use.

Here we come again in the IE vs Rest of the World war :D

04-07-2005, 03:53 PM
By the way... haw can I separate the "taged" childs from the textNode childs? Does anyone know? How can I traverse textnodes only? "taged" childs only?

04-07-2005, 05:43 PM

Actually, you fail to see some important things, regarding all four of DOM, HTML, CSS and XML. First of all, the name attribute was considered severely broken by the W3C (and thus by the representatives of the most important web technology companies, including all major browser makers) - the semantics and the contents model of the name attribute differed between form fields and non form field elements, and implementations had a hard time conforming to either, let alone both. The solution was to create an unambigous attribute for one use and keep the name for the other use - this new attribute was the id attribute. The sitiuation with the name attribute became even worse when the W3C introduced an element that could be a form field at certain times, but most often were not (object). For backward compatibility, the name attribute couldn't just be deprecated, though, so they kept it but noted in the specs that it was there only for backwards compatibility. Sure way to get an unchanged usage statistics for it.

Now, if you believe the situation was any better with both types of name attribute and the new id attribute, you were mistaken. The contents model of the id and the non form field name attributes were different, but they shared the same namespace. Both were required to be unique in that namespace. On the other hand, the form fields had their own specific quirks. The id attribute there actually added some functionality. Theoretically, the name attribute turned into what it had always been - a way to tag user entered values with a token name when sent to the server. This name attribute still had different semantics for different cases, though. Looking at the select element, it allowed multiple values with the same token name, but that token must be unique inside the form. Looking at other form fields, the demand was that radio and check box form fields may have non-unique token names, all other form fields limited to a unique name inside the form.

If I try to sum that mess up, it goes like this:
- The id attribute allows unique access to elements in the document.
- The name attribute for form fields acted like intended for form submissions, but had a set of different semantics for different form fields.
- The name attribute for non form fields allows unique access to elements in the document, shares namespace with the id attribute and allows a different set of characters, something causing browser makers a headache.

And in all that, I've overlooked one thing. Just to complicate things, there's another type of name attribute with yet again separate semantics, and that is for frames. But I'll just mention that it doesn't fit any of the above mentioned usages and move on.

Enter proto-CSS. CSS needed something new, a truly non unique token that could be used to aply styles to groups of elemnts as well as a unique way of styling only a single element. The name attribute didn't quite match either demand. The id attribute matched the latter demand, but that meant one attribute was missing. The attribute to fill the demand was class. This is your non unique token. Fit the bill, and being unambigous at that.

Enter the W3C DOM. The DOM was both a try at unification of the different object models of the browsers, an improvement upon the existing models, and a correction of faulty design made primarily by Netscape in the creation of their BOM for JavaScript. Sadly, the people developing the DOM never saw the possible need for a non unique token - possibly because the DOM was so much more than just a browser technology, and the CSS people weren't involved at first. The DOM created a number of traversal methods, but was in it's initial release not as well thought out as we would like it to have been. Microsoft being a leader of the development at this point, their hurry to release version five of their browser and Sun's Java being better suited to an entirely different method of traversal than is the case with JavaScript probably meant the entire spec was hurried.

There are a number of things that DOM1 missed. One of them was that the name attribute is semantically ambiguous and cannot be treated the same in all cases. On the other hand, it can be said that the DOM takes the right decision - it by purpose tries to phase out the name attribute, and the only reason it at all provides handling for it is that it wants to add some DOM0 backwards compatibility.

Enter XML, XHTML, XML namespaces. The W3C created XML to build an improved SGML. It was not fully SGML compatible, but SGML could be amended. What it did with SGML was to kill off ambigousities and increase demands put on the document editor to further simplify implementation. XML went a step beyond SGML in some areas, though. By XML namespaces and a reduction in dependency on DTDs, it created a way for applications to be mixed and shared. The W3C took the chance and converted the HTML4 specification to XML in the form of XHTML1. Not everything is exactly the same, but the difference is minimal. However, this meant a lot of things. For one, the W3C took some of it's chances to clean up the name mess (actually things like shared namespaces between ID type and other types, as well as a demand that there only exist one ID type attribute per element, forced some of the changes in converting HTML from SGML to XML) and removed deprecated stuff. Another thing is that qnames suddenly turned case sensitive. A third thing is that what elements are empty is no longer specified in the DTD, it's specified explicitly in the document, using a syntax that had another meaning in SGML.

Back to the DOM. Ask yourself, "what happens when we bring XHTML to the table?" The DOM earlier heeded HTML4 recommendations (normalisation of all tag and attribute names to uppercase is one example), but XHTML needs to be treated like XML first, and HTML second, when possible. The DOM stricted up, and as example removed the name mess for XHTML documents. It added namespaces and style sheets at last got an object model. Something else added to the DOM at this stage was a set of traversal methods that could theoretically replace the single purpose functions with a much more general traversal method, and then the need to create a specific function for non unique tokens in the form of the class attribute was from the spec's point of view removed.

And despite DOM3, that's about where we stand today.

All this is of course a try by me to explain why the DOM isn't the thing that is broken - it has it's faults, but those have in most cases been corrected already. No, the fault lies at a number of different instances:
- Earlier HTML versions than XHTML, and more specifically the name attribute ambiguity.
- Design faults in Netscape's original BOM, which assumed no ambiguities existed.
- A failure to push for a non unique token name in HTML DOM from the CSS people and non server side, non Java folks. (Java is better suited to the newer traversal system.)
- A failure of editors (automatic generation as well as humans) to recognise the ambiguity.
- Microsoft's failure to support XHTML and thus denying us the cleanup of this ambiguity that XHTML brought.

04-07-2005, 05:59 PM
What about the childNodes lenght and textNodes (I know, we have discussed part of these now and there, yet...)?

And Even Moz (which applies more responsible w3c recs) seems to make difference between

which sometimes drives me crazy... ;)

I know that W3C seems more logical to consider possible textNodes as childNodes even if there is an empty space there (due to HTML design habbits). Traversing with childNodes looks a wonderful weapon, but it looks so "decalibrated" in different "weapons", so that you must do a lot of preliminary adjustments to be able to finnaly use it...

04-07-2005, 06:36 PM
</div>Well, that should be simple to explain: The first case doen't contain any characters and thus no text node. The second case contains a text node "[U+00a0]", the third case contains a text node "[U+000a]".

When traversing nodes, you need to jump when node type is not what you want. If your traversal mechanism has that built in, then everything will work as it should.

04-07-2005, 07:12 PM
Kor: Consider a node traversal function something like this
// Generic traversal function, in-order, depth-first.
function traverse(ndRoot, fnCondition, fnAction, bIsDeep){

function goDeeper(nd){
return nd.firstChild||goNext(nd);

function goNext(nd){
return nd.nextSibling||goShallower(nd);

function goShallower(nd){
return nd!=ndRoot&&(nd.parentNode.nextSibling||goShallower(nd.parentNode));

}With an example usage like this:
function isTextNode(nd){
return nd.nodeType==3;

function alertData(nd){
return alert(nd.nodeName+': "'+nd.data+'"');


It's far from the most efficient traversal method, but it's a very generic function that can take any type of condition and action in the form of a function and apply them to all nodes you want it to. The fact that the condition is a function can make for interesting traversal styles...

04-08-2005, 12:21 PM
Thanx, liorean... My unfinished trials were almost close to your example. Yes, maybe it is not the most efficient, but, yes, you sensed that, I was looking for a generic way of traversing.

It was for me rather a theoretical problem. Yet, sometimes I felt the need of somethog similar... For instance, searching for all the elements which have the same class name, no matter the tag.

12-15-2005, 05:45 PM
I feel the need to rise again this thread, as I still don't undersand how it will W3C follow its policy for a future removing of the "name" attribute in the subsequent version of XHTML, while they have not shown a solution for the groups of radio buttons. Any news about this?

12-15-2005, 06:50 PM
Name is not intended to be removed from form controls. Just from anything that is not a form control.

Names for form controls, for the purpose of form submission, behave mostly like they want them. It's the abuse of names on non-form-controls that they want to remove.

Hehe, so this post is the basis for that 'cfpost.js' file I found in my old 'under development' directory... I've got a traverse function with both depth-first and breadth-first traversal that seems to have a basis in the function I listed here.