PDA

View Full Version : Using PSpell to check an HTML document


orangehairedboy
04-29-2003, 12:27 PM
I would like to be able to spell-check an HTML document...and I'm thinking about using PSpell. But, I need to extract a list of words from the document (while stripping out HTML tags), and put them into an array, which I would then use on the PSpell program.

Does anyone know how to do that? If I wasn't too clear, here's an example:

<head>
<title>My Page</title>
</head>
<body>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</body>
</html>

would be turned into an array I would use like:

echo $words[0]; /* prints "My" */
echo $words[1]; /* prints "Page" */
echo $words[2]; /* prints "Paragraph" */
echo $words[3]; /* prints "1" */
echo $words[4]; /* prints "Paragraph" */
echo $words[5]; /* prints "2" */

Thanks!

Lewis

missing-score
04-29-2003, 03:40 PM
I would use perl reg or html entities:

Here is an example:


<?php

$html_code = "/\\<(.*)\\>/";

$page_info = preg_replace($html_code, "", $page_info);

?>

mordred
04-29-2003, 08:45 PM
If you want to get rid of the surrounding HTML code around the words, have a look at the strip_tags() function. It's in the manual, I'm just to lazy at the moment to search the exact link. :)

missing-score
04-29-2003, 09:27 PM
mordred, you can usually find a function by putting:

www.php.net/function name ( replace _ with - )

so yours would be:

www.php.net/strip-tags


But yeah, good one.

orangehairedboy
05-01-2003, 01:57 AM
The problem I'm running into is a way to put the correct words back into the original html document. Getting the words out isn't a problem, but fining those words again and replacing them is the big issue...

Does anyone have insight on that?