StructuredText

StructuredText is a subclass of Text. It stores any kind of extra information corresponding to each character in the text. StructuredText has three parallel arrays -- string, emphasis, and structures. The content at the i'th element of the array contains the i'th character for the string, the i'th emphasis in the emphasis array and the i'th additonal information in the structures array. In NetFish, the i'th additional information is a collection of WebComposites and WebComponents.

Figure 1 shows a short segment of HTML and its parse tree. Each node has a scope on the string. The scope of WebComposite(#h1) is the whole string while the scope of WebComposite(#b) is the word 'example'. Figure 2 shows the scope of different nodes. StructuredText can be used to store inverted scope information such as which nodes have their scope covering the first character. The answer is WebComposite(#H1) and a WebComponent(#text).

Figure 1 The parse tree of a HTML segment

Figure 2 The scope of each node

To build the scope of WebComposite(#h1) and WebComposite(#b), assume wch1 and wcb represents the two nodes respectively, you can use

|s t| s := 'This is an example for StructuredText'. t := StructuredText fromString: aString. t structureAllWith: wch1. t structureFrom: 12 to: 18 with: wcb.