PDA

View Full Version : Find and Replace + more in C++????


bazpaul
06-23-2005, 01:22 PM
Hey great site by the way, its extremely helpful. Im wondering if you could help me too!

I am currently sorting my music collection and am saving Biographies of Artists as .txt files from web pages. Then i want to clean up all these txt files, i.e remove all of the junk and links etc, to leave a simple txt file with a biography of each artist. Now each txt file has the same groups of words and paragraphs, so i figure it would be easier to write a program that could find groups of text and delete them, heres a sample from the top of a txt file that i'd want to remove;

Username [E-mail Address]
Password [Forgot Password?]
Log me in automatically. Home Search Name Album Song Classical Work
Help Center
Questions?

I was gonna download a program to do this, but could only find string replacing programs. It is important that if a txt file does not have, lets say, the above paragraph, then it skips that and moves on to next paragraph to remove. Many programs i found cant do this. Also i know how to read a file, but does anyone know how to read folder, so that all txt files in this folder will be read.

Heres some code i have for reading a file, but i just need to manipulate it to search and remove paragraphs.


#include <iostream>
#include <stdlib.h>
#include <string>
#include <fstream>
#include <stdio.h>

using namespace std;

int main()
{
// C File Input
//--------------
FILE *inFile = fopen("in.txt", "r");

if (!inFile)
{
printf("Cannot find in.txt\n");
return 1;
}

char ch;
while ((ch = fgetc(inFile)) != '\n')
cout << ch;

cout << endl;

char line[100];
fgets(line, 100, inFile);
cout << line;

int a, b, c;
fscanf(inFile, "%d %d %d", &a, &b, &c);

cout << a << " " << b << " " << c << endl;

fclose(inFile);

cout << endl;

// C++ File Input
//----------------
ifstream fin("in.txt");

if (!fin.good())
{
cout << "Cannot find in.txt" << endl;
return 1;
}

while ((ch = fin.get()) != '\n')
cout << ch;

cout << endl;

fin.getline(line, 100);
cout << line << endl;

a = b = c = 0;
fin >> a >> b >> c >> ws;
cout << a << " " << b << " " << c << endl;

string str;
getline(fin, str);
cout << str << endl;

fin.close();

system("pause");

return 0;
}



If anyone could help twud be great,
Thanks, keep up good work

enumerator
06-25-2005, 12:39 AM
I can't help with the language, but can tell you that this sort of thing is pretty straight forward with the Windows Script Host, FileSystemObject, and a regular expression (if you're on Windows). :)

bazpaul
06-30-2005, 11:47 AM
Hey Enumerator,

Any suggestions on how i can use one of these to change multiple file names and text in files.

Thanks in advance

enumerator
06-30-2005, 12:42 PM
Yeah, but it may not be too helpful (until you describe exactly what needs to change)... I'll see if I can find a generic snippet, in the mean time.

enumerator
06-30-2005, 07:31 PM
Alright, this is sort of generic; in fact, it does just about nothing.

<?XML version="1.0"?>
<job>
<?job error="true" debug="true"?>

<runtime>
<usage><![CDATA[Drag & Drop a folder onto the script file, run it with an address parameter, or choose from the dialog to follow...]]></usage>
</runtime>

<comment>----------------------------------------------------<![CDATA[

Use the "bstrFilter" resource to filter by file name or type.

Separate filters with a semicolon: *.txt;*.asx; etc.

To include all names & types, use: *.*

]]>---------------------------------------------------------</comment>

<resource id="bstrFilter"><![CDATA[*.txt;]]></resource>
<resource id="nullObject"><![CDATA[Action Cancelled: required automation object(s) unavailable.]]></resource>
<resource id="nullFilter"><![CDATA[Action Cancelled: no files matching bstrFilter were found.]]></resource>
<resource id="oncomplete"><![CDATA[Operation Complete]]></resource>

<script language="JScript">
<![CDATA[

function myFindReplace(strText)
{
// define a routine HERE, and return its result...

return strText;
}

function myFileRename(filename,iteration)
{
// define a routine HERE, and return its result...

return filename;
}

(function(){with(WScript)
{
var IShellDispatch, IWshShell, IFileSystem;
if(!((IShellDispatch = AXO("Shell.Application")) && (IWshShell = AXO("WScript.Shell")) && (IFileSystem = AXO("Scripting.FileSystemObject"))))
return Echo(getResource("nullObject"));

var Folder;
if(!(Folder = attempt(function(){return IShellDispatch.NameSpace(Arguments.Item(0));})))
{
Arguments.ShowUsage();
if(!(Folder = IShellDispatch.BrowseForFolder(0, "Select a folder, or cancel...", 0x0200)))
return;
}

var Files = Folder.Items();
Files.Filter(0x0040,getResource("bstrFilter"));

var i, item;
if(!(i = Files.Count))
return Echo(getResource("nullFilter"));

if(IWshShell.Popup("Script will operate on " + i + " files...", 0, Folder.Title, 0x00040131) != 1)
return;

var strText, textStream, textFile;
while(item = Files.Item(--i))
{
strText = attempt(function(){
return (textStream = (textFile = IFileSystem.GetFile(item.Path)).OpenAsTextStream(1)).AtEndOfStream ?
null : textStream.ReadAll();});
if(!textStream)
continue;

textStream.Close();

if(!strText || textFile.Attributes & 1)
continue;

(textStream = textFile.OpenAsTextStream(2)).Write(myFindReplace(strText));
textStream.Close();

item.Name = sysConvention(myFileRename(item.Name,i)) || item.Name;
}

Echo(getResource("oncomplete"));

function attempt(method){
try{throw method();} catch(result){return result instanceof Error ? null : result;}}

function sysConvention(filename){
var
p1 = /[\x00-\x1f<"\/?\\*|:>]/g,

p2 = /^[\. ]+|[\. ]+$/g,

p3 = RegExp("(^AUX$|^CLOCK\\$$|^COM1$|^COM2$|^COM3$|^COM4$|^COM5$|^COM6$|^COM7$|^COM8$|^COM9$|" +
"^CON$|^LPT1$|^LPT2$|^LPT3$|^LPT4$|^LPT5$|^LPT6$|^LPT7$|^LPT8$|^LPT9$|^NUL$|^PRN$)","i");

return filename.replace(p1,"").replace(p2,"").replace(p3,"$1_");}

function AXO(progId,loc){
if(!(progId instanceof Array))
return bind(progId);
function bind(pId){
try{
throw loc ? new ActiveXObject(pId,loc) : new ActiveXObject(pId);}
catch(obj){
return obj instanceof Error ? null : obj;}}
var retval, p, i = -1;
while(p = progId[++i])
if(retval = bind(p))
break;
return retval;}
}})();

]]>
</script>
</job>

Be sure to test on copied files...

Links for regular expression tutorials are up in the JavaScript forum.

save as *.wsf

enumerator
07-01-2005, 09:11 AM
Yeah, well besides error handling its straight forward. ;)

I went back and removed the -2 tristate parameter from OpenAsTextStream; it defaults to 0 (ASCII): otherwise, attempting to write a stream from unicode to a different system default may erase the file. They don't seem to mention that one... :rolleyes: