So for now I'm writing a program that assumes the file contains valid html code. I'm also assuming that there are no comments. With that assumption...actually couldn't I use bison or something to generate code that would parse this for me?
Probably.
Anyway, how would I go about parsing an html file? That's a really good question. I'd have to keep a record of what elements I am currently parsing... For example
cat simple.html
Hello, world!
When getc() gives me the "B" in Bootstrap, then at that point my data structures should look like the following:
element elements [] =
0 -> element name = "!DOCTYPE html" attribute.name = html attribute.contents = "" older_sibling = 0; younger_sibling = 0; child = 0; done_parsing = true;
1 -> element name = "head" contents = ? don't know yet. Not done parsing older_sibling = 0; younger_sibling = 0; done_parsing = false; child = elementptr -> element element.name = title element.contents = ? don't know yet. Not done parsing. element.older_sibling = 0 element.younger_sibling = ? don't know yet not done parsing element.done_parsing = false;
while 1:
switch (c):
case "<"
parse_top_element ()
case ">"
parse_bottom_element ()
def parse_top_element ():
while ((c = getc()) != ">"):
string += c
https://stackoverflow.com/questions/11656532/returning-an-array-using-c
I'll have to dynamically increase the size of the array inside the function.
https://stackoverflow.com/questions/25798977/returning-string-from-c-function
I should probably be allocating these string via malloc https://www.geeksforgeeks.org/dynamically-allocate-2d-array-c/
<p>
<html>
</p>
<p>
<body>
</p>
<p>
<div>
</p>
<p>
<p> Hello <span> World! </span> <em> What's happening?</em> </p>
</p>
<p>
<div>
</p>
<p>
<div>
</p>
<p>
<div>
</p>
<p>
<div>
</p>
<p>
<p> cra cra How are you? </p>
</p>
<p>
<br/>
</p>
<p>
</div>
</p>
<p>
</div>
</p>
<p>
</div>
</p>
<p>
<div>
</p>
<p>
<p> What's going on here!? </p>
</p>
<p>
<br/>
</p>
<p>
<p> Hello </p>
</p>
<p>
</div>
</p>
<p>
</body>
</p>
<p>
</html>
</p>
If we reach a closing html element... element->done_parsing = true; element = element->parent_element;