Thread: Parsing HTML
View Single Post
Old 03-01-2011, 12:03 PM   PM User | #7
Samhain13
Regular Coder

 
Samhain13's Avatar
 
Join Date: Aug 2008
Location: Pilipinas
Posts: 165
Thanks: 4
Thanked 18 Times in 18 Posts
Samhain13 is on a distinguished road
Oooh, lots of machismo going around. "If you can prove that...."

Here's the thing. If you can get the League Results table from this page:
http://www.footballalliance.ph/league/Results.php

and return it as JSON without using any sort of SGML/HTML/XML parser and using only regular expressions, for updating in real-time the League Table in this page:
http://www.pinoyfootball.com/News

perhaps, we'll talk more.

And while you're at it, let's see you get the links and titles of the latest active threads from this page:
http://usapangfootball.proboards.com...=newestthreads

so that they can be rendered as a navigation list as what is done in this page:
http://www.pinoyfootball.com

Your challenge assumes that you're getting well-formed, even valid HTML source code. But the real world is full of soup. What happens if the source code you're evaluating has two, three elements that have the same ID? What happens when you're parsing elements that have multiple classes that are defined in varying order, like:

Code:
<a href="#" class="nav-link no-underline primary-link">Some String</a>
<a href="#" class="no-underline secondary-link nav-link">Another String</a>
<a href="#" class="secondary-link underline-this"><b>Yet Another String</b></a>
__________________
I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
** Independent Multimedia Artist in Pasig **

Last edited by Samhain13; 03-01-2011 at 12:09 PM..
Samhain13 is offline   Reply With Quote