...

View Full Version : Trim and format plain text



four0four
05-08-2009, 12:07 AM
I have the following script that converts line breaks from plain text into HTML formatted paragraphs. It takes plain text from one text area field and outputs the new formatted text into another text area field.




function convertText(){

var noBreaks = document.getElementById("oldText").value;
noBreaks = noBreaks.replace(/\r\n/g,"[-LB-]");

re4 = /\[-LB-\]\[-LB-\]/gi;
noBreaks = noBreaks.replace(re4,"</p><p>");

re5 = /\[-LB-\]/gi;
noBreaks = noBreaks.replace(re5," ");

noBreaks ='<p>'+noBreaks+'</p>';

noBreaks = noBreaks.replace(/<\/p><p>/g,"</p>\r\n\r\n<p>");

document.getElementById("newCode").value = noBreaks;

}



<textarea id="oldText" name="oldText" rows="12" cols="90"></textarea>

<textarea id="newCode" name="newCode" rows="12" cols="90"></textarea>

<input type="button" value="Convert" onclick="javascript:convertText()">



1. How can I trim/filter all double-spaces, triple-spaces, and so on? I only want single spaces to exist. Plus, is there a way to remove these spaces from the very begininng and end of each paragraph?

2. Secondly, if there's a space (not a line break or carriage return) between each paragraph, it merges the two paragraphs together, instead of creating two separate paragraphs. Is there a way to fix this?

Thanks!

venegal
05-08-2009, 12:48 AM
1.) For a maximum of one consecutive space, add

noBreaks = noBreaks.replace(/ +/g, " ");

For removal of spaces at the beginning and end of paragraphs, add

noBreaks = noBreaks.replace(/<p> *(.*?) *<\/p>/g, "<p>$1</p>");


Also, you may want to change the first one to

noBreaks = noBreaks.replace(/\r|\n/g,"[-LB-]");

so it matches both types of line breaks independently.

2.) Don't know what you mean there. If there's a space and not a line break, there's no new paragraph, so why should your script create one?

four0four
05-08-2009, 01:30 AM
Thanks! Those work great.


Also, you may want to change the first one to

noBreaks = noBreaks.replace(/\r|\n/g,"[-LB-]");
so it matches both types of line breaks independently.

I tried this, but it kept adding empty <p></p> tags between each paragraph.


2.) Don't know what you mean there. If there's a space and not a line break, there's no new paragraph, so why should your script create one?

For example, if there's two paragraphs that look like this:

plain text paragraph1
[single or multiple spaces]
plain text paragraph2

it outputs the following:

<p>plain text paragraph1 plain text paragraph2</p>

...instead of:

<p>plain text paragraph1</p>

<p>plain text paragraph2</p>


Thanks again!

venegal
05-08-2009, 02:09 AM
Ok, I see now what you mean. But you can't use /\r\n/g, because you can't expect that every line break consists of a carriage return followed by a line feed.

I'd suggest changing the first few lines to the following, which will solve both your problems:

var noBreaks = document.getElementById("oldText").value;
noBreaks = noBreaks.replace(/\r|\n/g,"[-LB-]");

noBreaks = noBreaks.replace(/\s*(\[-LB-\])/g, "$1");
noBreaks = noBreaks.replace(/(\[-LB-\]){2,}/g, "</p><p>");

four0four
05-08-2009, 03:17 AM
Ok, I see now what you mean. But you can't use /\r\n/g, because you can't expect that every line break consists of a carriage return followed by a line feed.

I'd suggest changing the first few lines to the following, which will solve both your problems:


var noBreaks = document.getElementById("oldText").value;
noBreaks = noBreaks.replace(/\r|\n/g,"[-LB-]");

noBreaks = noBreaks.replace(/\s*(\[-LB-\])/g, "$1");
noBreaks = noBreaks.replace(/(\[-LB-\]){2,}/g, "</p><p>");


Thanks! That solved the problem.

Although, now it adds empty <p></p> tags if there's a single line break before or after.

For example:

[*single line break*]
plaintext paragraph1
[*single line break*]
plaintext paragraph2
[*single line break*]

will output the following:

<p></p>
<p>plaintext paragraph1</p>
[*single line break*]
<p>plaintext paragraph2</p>
<p></p>

Here's what I have so far:



function convertText(){

var noBreaks = document.getElementById("oldText").value;

noBreaks = noBreaks.replace(/\r|\n/g,"[-LB-]");

noBreaks = noBreaks.replace(/\s*(\[-LB-\])/g, "$1");

noBreaks = noBreaks.replace(/(\[-LB-\]){2,}/g, "</p><p>");

re4 = /\[-LB-\]\[-LB-\]/gi;
noBreaks = noBreaks.replace(re4,"</p><p>");

re5 = /\[-LB-\]/gi;
noBreaks = noBreaks.replace(re5," ");

noBreaks ='<p>'+noBreaks+'</p>';

noBreaks = noBreaks.replace(/<\/p><p>/g,"</p>\r\n\r\n<p>");

noBreaks = noBreaks.replace(/ +/g, " ");

noBreaks = noBreaks.replace(/<p> *(.*?) *<\/p>/g, "<p>$1</p>");

document.getElementById("newCode").value = noBreaks;

}


Am I doing something wrong?

Thanks again! I really appreciate your help.

venegal
05-08-2009, 12:25 PM
I see. This one will work then:

function convertText(){
var noBreaks = document.getElementById("oldText").value;

noBreaks = noBreaks.replace(/\r|\n/g,"[-LB-]");
noBreaks = noBreaks.replace(/^(\[-LB-\])*/, "");
noBreaks = noBreaks.replace(/(\[-LB-\])*$/, "");
noBreaks = noBreaks.replace(/\s*(\[-LB-\])/g, "$1");
noBreaks = noBreaks.replace(/(\[-LB-\]){2,}/g, "</p><p>");

re5 = /\[-LB-\]/gi;
noBreaks = noBreaks.replace(re5," ");

noBreaks ='<p>'+noBreaks+'</p>';
noBreaks = noBreaks.replace(/<\/p><p>/g,"</p>\r\n\r\n<p>");
noBreaks = noBreaks.replace(/ +/g, " ");
noBreaks = noBreaks.replace(/<p> *(.*?) *<\/p>/g, "<p>$1</p>");

document.getElementById("newCode").value = noBreaks;
}

I removed the

re4 = /\[-LB-\]\[-LB-\]/gi;
noBreaks = noBreaks.replace(re4,"</p><p>");
because that's already done by code I gave you before and added

noBreaks = noBreaks.replace(/^(\[-LB-\])*/, "");
noBreaks = noBreaks.replace(/(\[-LB-\])*$/, "");

to remove line breaks at the beginning and the end.

four0four
05-09-2009, 01:14 AM
I see. This one will work then:

function convertText(){
var noBreaks = document.getElementById("oldText").value;

noBreaks = noBreaks.replace(/\r|\n/g,"[-LB-]");
noBreaks = noBreaks.replace(/^(\[-LB-\])*/, "");
noBreaks = noBreaks.replace(/(\[-LB-\])*$/, "");
noBreaks = noBreaks.replace(/\s*(\[-LB-\])/g, "$1");
noBreaks = noBreaks.replace(/(\[-LB-\]){2,}/g, "</p><p>");

re5 = /\[-LB-\]/gi;
noBreaks = noBreaks.replace(re5," ");

noBreaks ='<p>'+noBreaks+'</p>';
noBreaks = noBreaks.replace(/<\/p><p>/g,"</p>\r\n\r\n<p>");
noBreaks = noBreaks.replace(/ +/g, " ");
noBreaks = noBreaks.replace(/<p> *(.*?) *<\/p>/g, "<p>$1</p>");

document.getElementById("newCode").value = noBreaks;
}

I removed the

re4 = /\[-LB-\]\[-LB-\]/gi;
noBreaks = noBreaks.replace(re4,"</p><p>");
because that's already done by code I gave you before and added

noBreaks = noBreaks.replace(/^(\[-LB-\])*/, "");
noBreaks = noBreaks.replace(/(\[-LB-\])*$/, "");

to remove line breaks at the beginning and the end.

Excellent! Thank you so much!

One last question...it creates empty <p></p> tags if one of these line breaks contain a space or multiple spaces. Can that be fixed?

For example:

[*single line break*][space]
plaintext paragraph1
[*single line break*]
plaintext paragraph2
[*single line break*][space]

will output the following:

<p></p>
<p>plaintext paragraph1</p>
[*single line break*]
<p>plaintext paragraph2</p>
<p></p>


Many thanks!

venegal
05-09-2009, 07:16 PM
Ah, I see. Just another order necessary there. Change

noBreaks = noBreaks.replace(/^(\[-LB-\])*/, "");
noBreaks = noBreaks.replace(/(\[-LB-\])*$/, "");
noBreaks = noBreaks.replace(/\s*(\[-LB-\])/g, "$1");

to

noBreaks = noBreaks.replace(/\s*(\[-LB-\])/g, "$1");
noBreaks = noBreaks.replace(/^(\[-LB-\])*/, "");
noBreaks = noBreaks.replace(/(\[-LB-\])*$/, "");


I think we got it now.

four0four
05-09-2009, 09:42 PM
Ah, I see. Just another order necessary there. Change

noBreaks = noBreaks.replace(/^(\[-LB-\])*/, "");
noBreaks = noBreaks.replace(/(\[-LB-\])*$/, "");
noBreaks = noBreaks.replace(/\s*(\[-LB-\])/g, "$1");

to

noBreaks = noBreaks.replace(/\s*(\[-LB-\])/g, "$1");
noBreaks = noBreaks.replace(/^(\[-LB-\])*/, "");
noBreaks = noBreaks.replace(/(\[-LB-\])*$/, "");


I think we got it now.

Everything works great! Thank you very much for your help! :)



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum