View Full Version : Shorten text and HTML 4.01 strict

02-19-2006, 01:30 AM
I asked this question in the HTML/CSS area a while back, and didn't get a single idea of even where to start with this problem. Basically, I'm shortening text with a PHP function to display on a blog front page, and it's cutting off ending HTML tags (like </blockquote> or </ul>, etc).

It is displaying just fine, but it can cause my page not to validate under W3C HTML 4.01 strict. I'm wondering if there's any way to fix this, prevent it, or even just an idea of how to approach the problem.

Here's my original post:


02-19-2006, 02:02 AM
Most 'teasers' if autogenerated are posted without formatting (strip_tags() etc)
& I think half of the reason for this is that its not straightforward to do what you want to do & the other half is that the formatting used within a page may or may not work in the context of a small `teaser`.

You could simply store a seperate field (in your db or however you are storing) just for the teaser since you often may want to summerize the main contents (rather than grab the first $x words)

To try and parse the content and repair is not that easy since there may be nested tags etc, e.g. there is no regex one-liner to cover that.
You could possibly use a third party sanitizer like htmlTidy but that seems overkill to me.

a seperate field for the teaser or strip_tags() would be my choice (& in that order)

02-19-2006, 04:22 AM
a seperate field for the teaser or strip_tags() would be my choice (& in that order)

Thanks for your reply, firepages.

I had thought of the strip_tags option, and decided I didn't want to lose anchor tags, bold, italic, etc, in the 'teaser' as you call it.

I hadn't thought of storing a teaser in a seperate field. I suppose that's the DB designer in me putting blinders on to anything that even comes close to data duplication. I'm not sure the extra work involved here (not just a one time cost like some magical function would be) would be worth having my page 4.01 compliant.

I was also considering that since I'm parsing every character anyway with my ShortenText function, why not simply keep track of what tags are 'open' (recursively maybe) and simply close remaining open tags when I parse a long enough teaser string?