PDA

View Full Version : Shorten a HTML Formatted String ignoring the HTML?


aspdotnetguy
07-18-2002, 04:21 PM
Hello,

I am attempting to shorten HTML formatted strings but I want to ignore the HTML and shorten the text only, for example:

Say I wanted to restrict a strings length to 10 Chars and I have this string

<p>1234567890</p>
12345678901234567 --17 chars long so I have to cut 7 off

When they are cut off it should look like this
----------------
<p>123</p>
1234567890
----------------
Not like this
----------------
<p>1234567
1234567890
----------------

Note: The numbers underneath are there only to illustrate the numebr of characters.

I created a function that does not delete any chars inbetween <> but it is buggy at best, and presents a problem in cases where a user might type in

<p>hello this is an example of angle brackets <anglebrackets></p>

Becase it should delete those right. I have searched extensively for a JavaScript method that will determine if a character (charAt()) is part of an HTML tag but can not find anything. I am not suire what the best way to approach this.

I was wondering if there is a JavaScript method I somehow have missed of any of you have ever done anything like this.

Any help is greatly appreciated.

Thank You

jkd
07-18-2002, 06:14 PM
var str = '<p>1234567890</p>';
var shortened = str.replace(/<p>([^(?:<\/p>)]*)<\/p>/, '<p>' + RegExp.$1.substring(0, 3) + '</p>');

:)

aspdotnetguy
07-18-2002, 07:22 PM
I need it to take into account every HTML tag :(

aspdotnetguy
07-18-2002, 07:34 PM
Impressive Tho...

jkd
07-18-2002, 08:02 PM
Nested tags will be a real problem if you are treating this as a simple string. But this works for any valid XML element:

theString.replace(/<((?:[a-z]+:)?[a-z]+)((?: (?:[a-z]+:)?[a-z]+=['"][^'"]*['"])*)>([^(?:<\/\1>)]*)<\/\1>/gi, '<$1$2>' + RegExp.$3.substring(0, 3) + '</$1>'));

Returns a modified string where the child text node is now length 3.

If you want to parse the string into a DOM object, I could think of a way utilizing the DOM2 Traversal TreeWalker interface to easily trim the string. Of course, only Gecko supports TreeWalker...