Go Back   CodingForums.com > :: Client side development > JavaScript programming

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 08-17-2005, 10:40 PM   PM User | #1
gunder
New Coder

 
Join Date: Dec 2004
Posts: 14
Thanks: 0
Thanked 0 Times in 0 Posts
gunder is an unknown quantity at this point
how best to parse large amounts of text?

Hello everyone, I was wondering if anyone could give me some tips on how to parse large amounts of text. I play a strategy game through email, the turn report is sent to me, I write out my orders and send it back so on and so on. I normally just do this in notepad but I figured I could write a very basic client in javascript. There are clients already available but I would like to write my own for three reasons, the challenge, I don't really like any of the available clients and I can't install anything at work and I do most of it on my breaks while at work.

What I have in mind is a text box that I could paste my turn report into, hit a button, have it parsed and then display it in an easier to read fashion. I'm ok with creating the nicer display and everything, I'm just trying to find an easier way to parse the text. My current report is over 1500 lines long and getting longer each turn. Here is a small snippet of my report so you can see what I'm working with :

Faction Status:
Tax Regions: 4 (24)
Trade Regions: 6 (10)
Mages: 2 (2)

Errors during turn:
Dalesor Reavers (32264): MOVE: Unit has insufficient movement
points; remaining moves queued.

Events during turn:
Joss (377): Claims $100.
Mernic (1345): Claims $100.
Guards (6394): Gives 80 silver [SILV] to Fighters (6521).

That's just a small portion of the type of stuff I would be dealing with. I'm guessing it would be easiet to use indexOf() and split() but I'm a little lost as how to grab all the correct info. For example, under "Faction Status" there are only those three things, the only thing that would change is the numbers. The "Errors during turn" and "Events during turn" change constantly so how could make sure to grab all of the info each time and make sure that's all I'm grabing?

I'm sorry if this isn't making much sense, basically I just need to know the best way to parse large amounts of text. The book I have doesn't really cover it and I couldn't find anything too usefull through a google search. If anyone has any ideas I would really appreciate it.

-gunder
gunder is offline   Reply With Quote
Old 08-17-2005, 11:16 PM   PM User | #2
martin_narg
Regular Coder

 
martin_narg's Avatar
 
Join Date: Jul 2002
Location: Chamonix, France
Posts: 600
Thanks: 1
Thanked 3 Times in 3 Posts
martin_narg is an unknown quantity at this point
If you were to do this using javascript, a regular expression would be the best way. I would certainly suggest loading the text to be formatted into a textarea, rather than loading the file into memory using an activeX control or equivelant.

Depending on how large the files are if they are less than 100k, then use javascript without a thought. For larger files (MB and above, I would recommend Perl or C as these languages will manipulate the files more efficiently on a cellular level.

you can do a low-level speed test by running a regular expression such as can be found below on some text. Paste html page sources of varying sizes into the first textarea to find out how much text can be processed efficiently:
Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Untitled Document</title>
<script type="text/javascript">
function cleanHTML(objIn, objOut) {
	objOut.value = objIn.value.replace(/<[^>]*>/g, function tagMatch(s) {
			if(s.indexOf("<img") > -1) { // leave images alone!
				return s;
			} else if(s.indexOf("/") > -1) { // add a newline after closing tags
				return "\n";
			} else {
				return ""; // clean out all opening tags
			}
		}	
	);
}
</script>
</head>

<body>
<form name="frm" onsubmit="cleanHTML(this.txtInput, this.txtOutput);return false;">
<textarea name="txtInput" cols="100" rows="10">
<p>Hello world</p>
<h3>line 2!</h3>
<br>
Line 3!
<img src="img.gif">
</textarea>
<br><br>
<textarea name="txtOutput" cols="100" rows="10"></textarea>
<br><br>
<input type="submit" name="submit" value="submit">
</form>
</body>
</html>
Hope this helps

m_n
__________________
"Cos it's strange isn't it. You stand in the middle of a library and go 'Aaaaaaaaaaaaaaaaggggggghhhhhhh!'
and everybody just stares at you. But you do the same in an aeroplane, and everybody joins in."
-Tommy Cooper
martin_narg is offline   Reply With Quote
Old 08-17-2005, 11:40 PM   PM User | #3
gunder
New Coder

 
Join Date: Dec 2004
Posts: 14
Thanks: 0
Thanked 0 Times in 0 Posts
gunder is an unknown quantity at this point
Thank you for the reply, I'll play around with what you suggested and see what I can come up with.

-gunder
gunder is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:14 PM.


Advertisement
Log in to turn off these ads.