|
| 1 | +:chap_num: 13 |
| 2 | +:prev_link: 12_browser |
| 3 | +:next_link: 14_FIXME |
| 4 | + |
| 5 | += The Document Object Model = |
| 6 | + |
| 7 | +A JavaScript program running in the browser is locked up in its |
| 8 | +sandbox, unable to interact with the rest of the system. But it is not |
| 9 | +alone. The web page itself, the document that the browser is |
| 10 | +displaying, is in there as well. |
| 11 | + |
| 12 | +Interacting with this document, in order to enhance it, make it |
| 13 | +interactive, or turn it into a full-blown application, is what |
| 14 | +JavaScript was invented for. |
| 15 | + |
| 16 | +== Document structure == |
| 17 | + |
| 18 | +A HTML document can be visualized as a nested set of boxes. Tags like |
| 19 | +`<body>` and `</body>` enclose other tags, which in turn contain other |
| 20 | +tags (or text). |
| 21 | + |
| 22 | +[sandbox="homepage"] |
| 23 | +[source,text/html] |
| 24 | +---- |
| 25 | +<!doctype html> |
| 26 | +<html> |
| 27 | + <head> |
| 28 | + <title>My home page</title> |
| 29 | + </head> |
| 30 | + <body> |
| 31 | + <h1>My home page</h1> |
| 32 | + <p>Hello, I am Marijn and this is my home page.</p> |
| 33 | + <p>I also wrote a book! Read it |
| 34 | + <a href="http://eloquentjavascript.net">here</a>.</p> |
| 35 | + </body> |
| 36 | +</html> |
| 37 | +---- |
| 38 | + |
| 39 | +This example page has the following structure: |
| 40 | + |
| 41 | +image::img/html-boxes.svg[alt="HTML document as nested boxes"] |
| 42 | + |
| 43 | +The data structure the browser uses to represent the document follows |
| 44 | +this shape. For each box, there is an object, which we can interact |
| 45 | +with to find out things like what HTML tag it represents, and which |
| 46 | +boxes and text it contains. This representation is called the |
| 47 | +_Document Object Model_, DOM for short. |
| 48 | + |
| 49 | +The global variable `document` gives us access to these objects. Its |
| 50 | +`documentElement` property refers to the object representing the |
| 51 | +`<html>` tag. It also provides properties `head` and `body`, holding |
| 52 | +the objects for those elements. The body, the actual visual part of |
| 53 | +the document, is usually the element we want to work with. |
| 54 | + |
| 55 | +== Trees == |
| 56 | + |
| 57 | +Think back to the syntax trees from Chapter 11 for a moment. Their |
| 58 | +structure is strikingly similar to the structure of a browser's |
| 59 | +document. Each “node” may refer to sub-nodes, children, which are |
| 60 | +themselves nodes. This shape is typical of nested structures where the |
| 61 | +same kind of element can be repeated inside existing elements. |
| 62 | + |
| 63 | +We call a data structure a _tree_ when it has a branching structure, |
| 64 | +contains no cycles (a node may not contain itself, directly or |
| 65 | +indirectly), and has a single, well-defined “root”. |
| 66 | + |
| 67 | +Trees come up a lot in computer science. Apart from representing |
| 68 | +recursive structures like the programs from Chapter 11 and HTML |
| 69 | +documents, they are also often used to maintain sorted sets of data, |
| 70 | +because elements can often be found or inserted more efficiently in a |
| 71 | +sorted tree than in a sorted flat array. |
| 72 | + |
| 73 | +A typical tree has different kinds of nodes. The syntax tree had |
| 74 | +variables, values, and application nodes, where applications always |
| 75 | +had children, and variables and values were _leaves_, nodes without |
| 76 | +children. |
| 77 | + |
| 78 | +The same goes for the DOM. Nodes for regular elements (the |
| 79 | +representation of a tag in the document) make up the structure of the |
| 80 | +document. These can (but most not) have child nodes. An example of |
| 81 | +such a node is `document.body`. Some of these children can be leaf |
| 82 | +nodes, such as pieces of text or comments (which are written between |
| 83 | +`<!--` and `-->` in HTML). |
| 84 | + |
| 85 | +Each DOM node object has a `nodeType` property, which contains a |
| 86 | +number code that identifies the type of node. Regular nodes have the |
| 87 | +value 1 (which is also defined as the constant property |
| 88 | +`document.ELEMENT_NODE`). Text nodes, representing a section of plain |
| 89 | +(non-tag) text in the document, get type 3 (`document.TEXT_NODE`). |
| 90 | +Comments get type 8 (`document.COMMENT_NODE`). |
| 91 | + |
| 92 | +So another way to visualize our document tree is: |
| 93 | + |
| 94 | +image::img/html-tree.svg[alt="HTML document as a tree"] |
| 95 | + |
| 96 | +The leaves are text nodes, and the arrows indicate |
| 97 | +parent-relationships between nodes. |
| 98 | + |
| 99 | +== The standard == |
| 100 | + |
| 101 | +Using cryptic number codes to represent node types is not a very |
| 102 | +JavaScript-like thing to do. Further on in this chapter, we'll see |
| 103 | +that other parts of the DOM interface also feel rather cumbersome and |
| 104 | +alien. The reason for this is that the DOM wasn't designed for just |
| 105 | +JavaScript, but rather tries to define a language-neutral interface |
| 106 | +that can be used in other systems as well, and not just for HTML, but |
| 107 | +also for XML, which is a generic data format with an HTML-like syntax. |
| 108 | + |
| 109 | +This is unfortunate. Standards are often useful. But in this case, the |
| 110 | +advantage (cross-language consistency) isn't all that powerful. And |
| 111 | +the downside, having an interface that's not well integrated with the |
| 112 | +language, is rather serious. |
| 113 | + |
| 114 | +As an example of such poor integration, consider the `childNodes` |
| 115 | +property that element nodes in the DOM have. This property holds an |
| 116 | +array-like object, with a `length` property and properties labeled by |
| 117 | +numbers (`0`, `1`) to access the child nodes. But it is an instance of |
| 118 | +the `NodeList` type, not a real array, so it does not have methods |
| 119 | +like `slice` and `forEach`. |
| 120 | + |
| 121 | +Then there are issues that are simply the result of old-fashioned |
| 122 | +design. For example, there is no way to create a new node and |
| 123 | +immediately add children or attributes to it. Instead, you have to |
| 124 | +first create it, then add the children one by one, and set the |
| 125 | +attributes one by one. Code that interacts heavily with the DOM tends |
| 126 | +to get very long, repetetive, and ugly. |
| 127 | + |
| 128 | +But of course, JavaScript allows us to create our own abstractions. It |
| 129 | +is easy to write some helper functions that allow you to express the |
| 130 | +operations you are performing in a clearer and shorter way. In fact, |
| 131 | +many libraries intended for browser programming come with such |
| 132 | +functions. |
| 133 | + |
| 134 | +== Moving through the tree == |
| 135 | + |
| 136 | +DOM nodes contain a wealth of links to other, nearby nodes. The |
| 137 | +following diagram tries to illustrate these. |
| 138 | + |
| 139 | +image::img/html-links.svg[alt="Links between DOM nodes"] |
| 140 | + |
| 141 | +Every node has a `parentNode` property, pointing to the node it is |
| 142 | +part of. The diagram only shows one of each link type. Every element |
| 143 | +node (node type 1) has a `childNodes` property that contains a |
| 144 | +pseudo-array with its children. Those represent the fundamental |
| 145 | +structure of the tree. |
| 146 | + |
| 147 | +In addition, there are a number of convenience links. The `firstChild` |
| 148 | +and `lastChild` properties point to the first and last child element, |
| 149 | +or have the value null for nodes without children. Similarly, |
| 150 | +`previousSibling` and `nextSibling` point to adjacent nodes, nodes |
| 151 | +with the same parent that appear immediately before or after the node |
| 152 | +itself. For a first child, `previousSibling` will be null, and for a |
| 153 | +last child, `nextSibling` is null. |
| 154 | + |
| 155 | +When dealing with a data structure like this, whose structure repeats |
| 156 | +itself as we go deeper, recursive functions are often useful. The one |
| 157 | +below scans a document for text nodes containing a given string, and |
| 158 | +returns true when it has found one. |
| 159 | + |
| 160 | +[sandbox="homepage"] |
| 161 | +[source,javascript] |
| 162 | +---- |
| 163 | +function talksAbout(node, string) { |
| 164 | + if (node.nodeType == 1) { |
| 165 | + for (var i = 0; i < node.childNodes.length; i++) { |
| 166 | + if (talksAbout(node.childNodes[i], string)) |
| 167 | + return true; |
| 168 | + } |
| 169 | + return false; |
| 170 | + } else if (node.nodeType == 3) { |
| 171 | + return node.nodeValue.indexOf(string) > -1; |
| 172 | + } |
| 173 | +} |
| 174 | + |
| 175 | +console.log(talksAbout(document.body, "book")); |
| 176 | +// → true |
| 177 | +---- |
| 178 | + |
| 179 | +The `nodeValue` property of a text node refers to the string of text |
| 180 | +that it represents. |
| 181 | + |
| 182 | +== Finding elements == |
| 183 | + |
| 184 | +Navigating these links to parents, children, and siblings is |
| 185 | +occasionally useful, for example in the function above, which blindly |
| 186 | +runs through the whole document. But usually, tying assumptions about |
| 187 | +the precise structure of your document into your program is a bad |
| 188 | +idea, since you might want to change that structure later. Another |
| 189 | +complicating factor is that text nodes are created even for the |
| 190 | +whitespace (newlines and spaces) between nodes. The example document's |
| 191 | +body tag does not have just three children (`<h1>` and two `<p>`’s), |
| 192 | +but actually has 7 (those three, plus the space before, after, and |
| 193 | +between them). |
| 194 | + |
| 195 | +So if we want to get the `href` attribute of the link in that |
| 196 | +document, we don't want to say something horrible like “get the second |
| 197 | +child of the sixth child of the document body”. It'd be better if we |
| 198 | +could say “get the first link in the document”. And we can. |
| 199 | + |
| 200 | +[sandbox="homepage"] |
| 201 | +[source,javascript] |
| 202 | +---- |
| 203 | +var link = document.body.getElementsByTagName("a")[0]; |
| 204 | +console.log(link.href); |
| 205 | +---- |
| 206 | + |
| 207 | +All element nodes have a `getElementsByTagName` method that retrieves |
| 208 | +a pseudo-array of all elements with the given tag name that exist |
| 209 | +inside of that element (even if they are wrapped in other nodes). |
| 210 | + |
| 211 | +To find a specific _single_ node, you can give it an `id` attribute, |
| 212 | +and use `document.getElementById` instead. |
| 213 | + |
| 214 | +[source,text/html] |
| 215 | +---- |
| 216 | +<!doctype html> |
| 217 | + |
| 218 | +<p>My ostrich Gertrude:</p> |
| 219 | +<p><img id="image" src="img/ostrich.png"></p> |
| 220 | + |
| 221 | +<script> |
| 222 | + var ostrich = document.getElementById("image"); |
| 223 | + console.log(ostrich.src); |
| 224 | +</script> |
| 225 | +---- |
| 226 | + |
| 227 | +A third, similar method is `getElementsByClassName`, which, like |
| 228 | +`getElementsByTagName`, searches through the contents of an element |
| 229 | +node, and retrieves all elements that have the given string in their |
| 230 | +`class` attribute. |
| 231 | + |
| 232 | +There exist also `getElementByTagName` and `getElementByClassName` |
| 233 | +(note, “element” is not pluralized), which instead of returning a |
| 234 | +pseudo-array, return the first element that matches, or null if none |
| 235 | +is found. |
| 236 | + |
| 237 | +== Changing the document == |
| 238 | + |
| 239 | +Almost everything about the DOM data structure can be changed. Element |
| 240 | +nodes have a number of methods for changing their content. The |
| 241 | +`removeChild` method removes the given child node from the document. |
| 242 | +To add a child, we can use `appendChild`, which puts it at the end of |
| 243 | +list of children, or `insertBefore`, which inserts the node given as |
| 244 | +first argument before the node given as second argument. |
| 245 | + |
| 246 | +[source,text/html] |
| 247 | +---- |
| 248 | +<!doctype html> |
| 249 | + |
| 250 | +<p>One</p> |
| 251 | +<p>Two</p> |
| 252 | +<p>Three</p> |
| 253 | + |
| 254 | +<script> |
| 255 | + var paragraphs = document.body.getElementsByTagName("p"); |
| 256 | + document.body.insertBefore(paragraphs[2], paragraphs[0]); |
| 257 | +</script> |
| 258 | +---- |
| 259 | + |
| 260 | + |
| 261 | + |
| 262 | + |
| 263 | + |
| 264 | +Exercises: |
| 265 | + - implement getElementsByTagName |
0 commit comments