...

View Full Version : How to get the right charset/encoding?



QueenZ
03-03-2012, 01:41 PM
Hello, I am trying to parse the title from a Chinese website but I'm getting a wrong result. It seems like an encoding problem? What can I do about it?

I need to get the title, the text on the gray background: 我和哥哥的秘密花园

But instead it's outputting this: 脦脪潞脥赂莽赂莽碌脛脙脴脙脺禄篓脭掳


what's wrong?


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html>
<head>
<title>TEST</title>
<meta charset="gbk" />
</head>

<body>
<?php
$dom = new DomDocument;
libxml_use_internal_errors(true);
$am_link = "http://tieba.baidu.com/p/21993922";
$dom->loadHTMLFile($am_link);
libxml_clear_errors();


$xpath = new DomXpath($dom);
$nodes = $xpath->query('//div[@class="l_thread_title"]/descendant::h1[1]');
foreach ($nodes as $node)
{
echo $node->nodeValue, "\n";
echo "<br />";
}
?>
</body>
</html>

dan-dan
03-03-2012, 02:10 PM
Maybe <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

QueenZ
03-03-2012, 02:42 PM
Maybe <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

that gave me something gibberish like һҳ βҳ



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum