08-23-2011, 10:42 PM
If I have a URL and the URL has certain id's or classes in a div or span, does anyone have any good code to find these? For example, I want to find a span like <span class="street-address">1234 Old Road</span> and I want the data in the span for that. Say there is also another scenario like <div id="bizUrl"><a href="http://www.codingforums.com">Coding Forums</a></div> then in that case I'd like to take the href variable etc.
Any ideas on the best way to do something like this?
Thanks in advance!
08-23-2011, 10:52 PM
Never tried it, but it appears to be quite handy. I would imagine the following syntax would work:
$html = file_get_html( 'http://www.example.com' );
$text_from_class = $html->find( '.street-address', 0 )->innertext;
$text_from_id = $html->find( '#bizUrl a', 0 )->href;
Untested... but if the selectors work just like jQuery, as they claim, that should work.
08-23-2011, 11:13 PM
Thanks I'll try that. What about any info in a meta tag like for example:
<meta property="og:type" content="restaurant">
<meta property="og:longitude" content="-87.630765">
08-23-2011, 11:24 PM
08-23-2011, 11:27 PM
Thank you for the help. I'll let you know if I have any problems.
08-24-2011, 12:16 AM
Ok, for the meta tags it didn't find them because they had a tag of property instead of name. The get_meta_tags only looked for the name field. Anyway that I can get the property content values like in my example?
08-24-2011, 02:35 PM
Yeah, it wouldn't work with the <h1> tag so that is why I opened up a new tag. Anyway, I did have a question on that which I couldn't find an answer to. Based on what you saw, can you get text back and then do another find on the text you received back? It didn't look like that was an option but that would be helpeful.
When I do something like the following:
$biz_info_content = $biz_page->find( 'div#bizInfoContent', 0 )->outertext;
$cat_display = $biz_info_content->find( 'span#cat_display', 0 )->outertext;
I get an error saying: Fatal error: Call to a member function find() on a non-object
The $biz_page in this example is the original URL I'm trying to crawl.
08-24-2011, 03:07 PM
We will need more code than that... it doesn't appear the variables are a directly returned value from the file_get_html() function.