Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 3 of 3
  1. #1
    New Coder
    Join Date
    Jun 2008
    Posts
    30
    Thanks
    7
    Thanked 0 Times in 0 Posts

    How to get the right charset/encoding?

    Hello, I am trying to parse the title from a Chinese website but I'm getting a wrong result. It seems like an encoding problem? What can I do about it?

    I need to get the title, the text on the gray background: 我和哥哥的秘密花园

    But instead it's outputting this: 脦脪潞脥赂莽赂莽碌脛脙脴脙脺禄篓脭掳


    what's wrong?

    Code:
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html>
    	<head>
    		<title>TEST</title>
    		<meta charset="gbk" />
    	</head>
    	
    	<body>
    		<?php
    			$dom = new DomDocument;
    			libxml_use_internal_errors(true);
    			$am_link = "http://tieba.baidu.com/p/21993922";
    			$dom->loadHTMLFile($am_link); 
    			libxml_clear_errors();
    
    
    			$xpath = new DomXpath($dom);
    			$nodes = $xpath->query('//div[@class="l_thread_title"]/descendant::h1[1]');
    			foreach ($nodes as $node)
    			{
    			  echo $node->nodeValue, "\n";
    			  echo "<br />";
    			}
    		?>
    	</body>
    </html>

  • #2
    Regular Coder dan-dan's Avatar
    Join Date
    Aug 2009
    Location
    England
    Posts
    483
    Thanks
    22
    Thanked 79 Times in 78 Posts
    Maybe <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

  • #3
    New Coder
    Join Date
    Jun 2008
    Posts
    30
    Thanks
    7
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by dan-dan View Post
    Maybe <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    that gave me something gibberish like ¤┬Ď╗Ď│ ╬▓Ď│


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •