Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 12 of 12
  1. #1
    Regular Coder the-dream's Avatar
    Join Date
    Mar 2007
    Location
    Northamptonshire, UK
    Posts
    477
    Thanks
    8
    Thanked 4 Times in 4 Posts

    Help Improve Accuracy In My Contextual Ad Targeting Script

    Hey Guys...

    I run a small untargeted ad network, called Branchr Advertising...

    I am currently coding a contextual advertising system, (code below) and I am wondering how I can make this more accurate, i.e. Generate a more relevant set of keywords for a given website/page.

    PHP Code:
    <?php

    function analyze($url$output='array') {
    // PAGE FUNCTIONS //
    function strip_html_tags$text )
    {
        
    $text preg_replace(
            array(
              
    // REMOVE INVISIBLE CONTENT
                
    '@<head[^>]*?>.*?</head>@siu',
                
    '@<style[^>]*?>.*?</style>@siu',
                
    '@<script[^>]*?.*?</script>@siu',
                
    '@<object[^>]*?.*?</object>@siu',
                
    '@<embed[^>]*?.*?</embed>@siu',
                
    '@<applet[^>]*?.*?</applet>@siu',
                
    '@<noframes[^>]*?.*?</noframes>@siu',
                
    '@<noscript[^>]*?.*?</noscript>@siu',
                
    '@<noembed[^>]*?.*?</noembed>@siu',
              
    // ADD LINE BREAKS AFTER BLOCKS
                
    '@</?((address)|(blockquote)|(center)|(del))@iu',
                
    '@</?((div)|(h[1-9])|(ins)|(isindex)|(p)|(pre))@iu',
                
    '@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu',
                
    '@</?((table)|(th)|(td)|(caption))@iu',
                
    '@</?((form)|(button)|(fieldset)|(legend)|(input))@iu',
                
    '@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu',
                
    '@</?((frameset)|(frame)|(iframe))@iu',
            ),
            array(
                
    ' '' '' '' '' '' '' '' '' ',
                
    "\n\$0""\n\$0""\n\$0""\n\$0""\n\$0""\n\$0",
                
    "\n\$0""\n\$0",
            ),
            
    $text );
        return 
    strip_tags$text );
    }

    function 
    removeCommonWords($input){
     
    // CREATE AN ARRAY OF COMMON/BANNED WORDS
    $commonWords = array('a','able','about','above','abroad','according','accordingly','across','actually','adj','after','afterwards','again','against','ago','ahead','aint','all','allow','allows','almost','alone','along','alongside','already','also','although','always','am','amid','amidst','among','amongst','an','and','another','any','anybody','anyhow','anyone','anything','anyway','anyways','anywhere','apart','appear','appreciate','appropriate','are','arent','around','as','as','aside','ask','asking','associated','at','available','away','awfully','b','back','backward','backwards','be','became','because','become','becomes','becoming','been','before','beforehand','begin','behind','being','believe','below','beside','besides','best','better','between','beyond','both','brief','but','by','c','came','can','cannot','cant','cant','caption','cause','causes','certain','certainly','changes','clearly','cmon','co','co.','com','come','comes','concerning','consequently','consider','considering','contain','containing','contains','corresponding','could','couldnt','course','cs','currently','d','dare','darent','definitely','described','despite','did','didnt','different','directly','do','does','doesnt','doing','done','dont','down','downwards','during','e','each','edu','eg','eight','eighty','either','else','elsewhere','end','ending','enough','entirely','especially','et','etc','even','ever','evermore','every','everybody','everyone','everything','everywhere','ex','exactly','example','except','f','fairly','far','farther','few','fewer','fifth','first','five','followed','following','follows','for','forever','former','formerly','forth','forward','found','four','from','further','furthermore','g','get','gets','getting','given','gives','go','goes','going','gone','got','gotten','greetings','h','had','hadnt','half','happens','hardly','has','hasnt','have','havent','having','he','hed','hell','hello','help','hence','her','here','hereafter','hereby','herein','heres','hereupon','hers','herself','hes','hi','him','himself','his','hither','hopefully','how','howbeit','however','hundred','i','id','ie','if','ignored','ill','im','immediate','in','inasmuch','inc','inc.','indeed','indicate','indicated','indicates','inner','inside','insofar','instead','into','inward','is','isnt','it','itd','itll','its','its','itself','ive','j','just','k','keep','keeps','kept','know','known','knows','l','last','lately','later','latter','latterly','least','less','lest','let','lets','like','liked','likely','likewise','little','look','looking','looks','low','lower','ltd','m','made','mainly','make','makes','many','may','maybe','maynt','me','mean','meantime','meanwhile','merely','might','mightnt','mine','minus','miss','more','moreover','most','mostly','mr','mrs','much','must','mustnt','my','myself','n','name','namely','nd','near','nearly','necessary','need','neednt','needs','neither','never','neverf','neverless','nevertheless','new','next','nine','ninety','no','nobody','non','none','nonetheless','noone','noone','nor','normally','not','nothing','notwithstanding','novel','now','nowhere','o','obviously','of','off','often','oh','ok','okay','old','on','once','one','ones','ones','only','onto','opposite','or','other','others','otherwise','ought','oughtnt','our','ours','ourselves','out','outside','over','overall','own','p','particular','particularly','past','per','perhaps','placed','please','plus','possible','presumably','probably','provided','provides','q','que','quite','qv','r','rather','rd','re','really','reasonably','recent','recently','regarding','regardless','regards','relatively','respectively','right','round','s','said','same','saw','say','saying','says','second','secondly','see','seeing','seem','seemed','seeming','seems','seen','self','selves','sensible','sent','serious','seriously','seven','several','shall','shant','she','shed','shell','shes','should','shouldnt','since','six','so','some','somebody','someday','somehow','someone','something','sometime','sometimes','somewhat','somewhere','soon','sorry','specified','specify','specifying','still','sub','such','sup','sure','t','take','taken','taking','tell','tends','th','than','thank','thanks','thanx','that','thatll','thats','thats','thatve','the','their','theirs','them','themselves','then','thence','there','thereafter','thereby','thered','therefore','therein','therell','therere','theres','theres','thereupon','thereve','these','they','theyd','theyll','theyre','theyve','thing','things','think','third','thirty','this','thorough','thoroughly','those','though','three','through','throughout','thru','thus','till','to','together','too','took','toward','towards','tried','tries','truly','try','trying','ts','twice','two','u','un','under','underneath','undoing','unfortunately','unless','unlike','unlikely','until','unto','up','upon','upwards','us','use','used','useful','uses','using','usually','v','value','various','versus','very','via','viz','vs','w','want','wants','was','wasnt','way','we','wed','welcome','well','well','went','were','were','werent','weve','what','whatever','whatll','whats','whatve','when','whence','whenever','where','whereafter','whereas','whereby','wherein','wheres','whereupon','wherever','whether','which','whichever','while','whilst','whither','who','whod','whoever','whole','wholl','whom','whomever','whos','whose','why','will','willing','wish','with','within','without','wonder','wont','would','wouldnt','x','y','yes','yet','you','youd','youll','your','youre','yours','yourself','yourselves','youve','z','zero''january','february','march','april','may','june','july','august','september','october','november','december''dont''weve','theyre','comments','opinions''week''day''month''year''hour''min''minute''second''nbsp''newsnbsp''new''old''monday''tuesday''wednesday''thursday''friday''saturday' ,'sunday''new''web''mark''michael''christian''bao''jan''feb''mar''apr''may''jun''jul''aug''sep''oct''nov''dec''gain''loss''move''isnt','good''bad''ok''okey''okay''jim''john''smith''bill''today''tomorrow''lot''lost''lots''companie''put''high''low''top''bottom');
     
            return 
    preg_replace('/\b('.implode('|',$commonWords).')\b/','',$input);
    }
    // END FUNCTIONS //


    // GET THE 'TARGET' PAGE SOURCE AND PUT INTO THE $source VARIABLE.
    $source file_get_contents($url);

    // USE OUR FUNCTION AND A STANDARD FUNCTION TO STRIP PAGE HTML.
    $source strip_tags(strip_html_tags($source));

    // REMOVE ANY SPECIAL CHARACTERS
    $source preg_replace("/[^A-Za-z ]/",""$source);

    // MAKE ALL WORDS LOWER CASE
    $source strtolower($source);

    // REMOVE DISALLOWED WORDS
    $source removeCommonWords($source);
        
    // SPLIT THE GROUP OF WORDS UP WHERE THERE IS A SPACE, PUT RESULT INTO AN ARRAY
    $keywords preg_split("/ /"$source);

    foreach(
    $keywords as $wordnum=>$keyword) {
        
    // ENSURE THE KEYWORD ISN'T NOTHING
        
    if($keyword != '') {
            
    // ENSURE THE KEYWORD IS OVER 2 CHARS.
            
    if(strlen($keyword) > 2) {
                
    // INSURE THE KEYWORD IS LESS THAN OR EQUAL TO 10 CHARS
                
    if(strlen($keyword) <= 10) {
                    
    // DE-PLURALIZE WORDS
                    
    if($keyword{strlen($keyword)-1} == 's') {
                        
    $keyword substr($keyword0, -1);
                    }
                    
    // PUT FILTERED WORDS INTO A NEW ARRAY
                    
    $words[] = $keyword;
                }
            }
        }
    }

    // COUNT THE TOTAL AMOUNT OF KEYWORDS
    $tw count($keywords);

    // COUNT HOW MANY TIMES EACH WORD OCCURS AND PUT INTO ARRAY (word => count)
    foreach($words as $wn=>$kw) {
        
    $key[$kw] = $key[$kw] + 1;
    }

    foreach(
    $key as $keywd=>$occurances) {
        
    // ONLY INCLUDE WORDS THAT HAVE BEEN USED 3 TIMES ORE MORE
        
    if($occurances >= 3) {
            
    // GIVE EACH WORD A SCORE BASED ON IT'S OCCURANCES FROM THE TOTAL AMOUNT OF WORDS ON THE PAGE
            
    $keyrank[$keywd] = round(100/$tw*$occurances3);
        }
    }

    // SORT THE KEYWORDS ARRAY SO THE HIGHEST RANKED WORDS ARE AT THE TOP
    arsort($keyrankSORT_NUMERIC);

    // CREATE AN ARRAY OF RESULTS
    if($output == 'array') {
        return 
    $keyrank;
    }

    // CREATE A TABLE OF RESULTS
    if($output == 'table') {
    $results '<table>';
    $results .= '<tr><td><b><u>Keyword:</u></b></td><td><b><u>Rank:</u></b></td></tr>';
    foreach(
    $keyrank as $keyword=>$rank) {
        
    $results .= '<tr><td><b>'.$keyword.'</b></td><td>'.$rank.'</td>';
    }
    $results .= '</table>';
    return 
    $results;
    }
    }


    $url $_GET['url'];
    $siteKeywords analyze($url);

    // Top 30 Keywords
    $count 1;
    foreach(
    $siteKeywords as $keyword=>$rank) {
        if(
    $count <= 50) {
            
    $kw[$keyword] = $rank;
            
    $kwds .= "$keyword ";
        } else {
            break;
        }
        
    $count++;
    }



     
    $query "SELECT * , 
       match(text_1,text_2,keywords) 
       against ('$kwds') 
       as relevance
    FROM 
       text_ads_test
    WHERE 
      match(text_1,text_2,keywords) against ('$kwds')
       ORDER BY relevance DESC"
    ;

    $ads mysql_query($query) or die(mysql_error());
    ?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html>
    <head>
    <style type="text/css">

    body {
        font-size: 13px;
        font-family: "Helvetica Neue", Helvetica, Verdana, Arial, sans-serif;
    }

    .green {
        color: #009f04;
    }

    .red {
        color: #a90000;
    }

    </style>
    <title>Branchr - Contextual Ad Targeting</title>
    </head>

    <body>
    <?php
    echo '<h2 style="padding: 0px; margin: 0px;">Branchr Contextual Ad Targeting</h2><hr /><b>The following ads are targeted to the content of:</b><br /><i>'.$url.'</i><br /><hr />';

    $foundAds mysql_num_rows($ads);

    if(
    $foundAds 0) {
    echo 
    'Status: <b class="green">Targeting Successful ('.$foundAds.' Relevant Ads)</b><br />Generated Keyword Set: <b>'.$kwds.'</b><hr />';
    while(
    $ad mysql_fetch_array($ads)) {
        echo 
    "<b>".$ad['title']."</b> (Relevance: ".$ad['relevance'].")<br />".$ad['text_1']."<br />".$ad['text_2']."<br /><br />";
    }
    } else {
    echo 
    'Status: <b class="red">Targeting Unsuccessful</b><br />Generated Keyword Set: <b>'.$kwds.'<hr />';    
    echo 
    ':(';

    }

    ?>
    </body>
    </html>

    Thanks for you help!

    P.S. If you run this code, you might want to remove the DB code, and have it just echo out a keyword set...

  • #2
    Regular Coder hinch's Avatar
    Join Date
    Sep 2005
    Location
    UK
    Posts
    923
    Thanks
    25
    Thanked 80 Times in 80 Posts
    it would be worth while you reading this

    http://www10.org/cdrom/papers/519/

    its about recommendation algorithms in your case your "buyers" would be the categories selected by the advertiser to display to ad in and the heap to search through would be your publishers websites.

    run the formulae and the top 5 match's would for example be your "recommended" ie: ideal sites to put the adverts on when matching against the advertisers prefered categories.
    A programmer is just a tool which converts caffeine into code

    My work: http://www.fcsoftware.co.uk && http://www.firstcontactcrm.com
    My hobby: http://www.angel-computers.co.uk
    My life: http://www.furious-angels.com

  • #3
    Regular Coder the-dream's Avatar
    Join Date
    Mar 2007
    Location
    Northamptonshire, UK
    Posts
    477
    Thanks
    8
    Thanked 4 Times in 4 Posts
    Looks like a pretty painful thing to read... :/

    But, I'll be sure to chug on with it, (may take a while). In the mean time, any small (or big) changes I can make to my current script to improve accuracy?

  • #4
    Regular Coder hinch's Avatar
    Join Date
    Sep 2005
    Location
    UK
    Posts
    923
    Thanks
    25
    Thanked 80 Times in 80 Posts
    not really you're doing a simple occurance count and sort its about as simple as it gets.

    I suppose you could add in weighted key words.
    so adverts in the "gardening" category for example may have a set of key words that are weighted more than standard occurance words and so if the word "garden" as a weighted key word appears then that is a prefered over a site that may have say "flowers" as a key word but no occurance of garden.

    though your treading in murky grounds doing that as it involves you having to think up as many weighted keywords as you can related to each of your ad categories.

    as for doing recommendation system matching I've done it a couple of times before on e-commerce sites and its a swine you'll probably want to kill yourself before you're finished.
    A programmer is just a tool which converts caffeine into code

    My work: http://www.fcsoftware.co.uk && http://www.firstcontactcrm.com
    My hobby: http://www.angel-computers.co.uk
    My life: http://www.furious-angels.com

  • #5
    Regular Coder the-dream's Avatar
    Join Date
    Mar 2007
    Location
    Northamptonshire, UK
    Posts
    477
    Thanks
    8
    Thanked 4 Times in 4 Posts
    Other than occurrence scoring for keywords, what would be a different option? But not weighted keywords, I would love to do this, but, It would take a number of months (if not years) to create an accurate set, and subset of keywords and phrases.

  • #6
    Regular Coder hinch's Avatar
    Join Date
    Sep 2005
    Location
    UK
    Posts
    923
    Thanks
    25
    Thanked 80 Times in 80 Posts
    perhaps the easiest way would be to have your publishers pick 5 categories that their site falls into when they sign up.

    then you can just default display adds based on categories it kinda does your targeted ads but without all the work involved

    you could also add in a background click through tracking to your current system. so your current system picks say 10 sites but 2 of those sites get bugger all click throughs. you could then say ok well even though they match they go into a dis-allow list of sorts for this ad as its not seeing returns that you'd hope to see.

    of course there's the downfall to that of small sites with say 100 visits a month may only get 1 click through anyway but thats just the way it goes
    A programmer is just a tool which converts caffeine into code

    My work: http://www.fcsoftware.co.uk && http://www.firstcontactcrm.com
    My hobby: http://www.angel-computers.co.uk
    My life: http://www.furious-angels.com

  • #7
    Regular Coder the-dream's Avatar
    Join Date
    Mar 2007
    Location
    Northamptonshire, UK
    Posts
    477
    Thanks
    8
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by hinch View Post
    perhaps the easiest way would be to have your publishers pick 5 categories that their site falls into when they sign up.
    Probably would've been easier, had I known I would eventually make it targeted, but, there is 2,300 websites on the network now... so that could be a slight issue.

    I'm just trying to think of an automated way to generate a keyword set for specific pages, then store the generated keyword set in a database, to use when an ad request is called.

    I mean, what I have at the minute could probably be edited in about half an hour to achieve what I want to, but, I want to get the best keywords sets I can, from code similar to what I have now...

  • #8
    Regular Coder hinch's Avatar
    Join Date
    Sep 2005
    Location
    UK
    Posts
    923
    Thanks
    25
    Thanked 80 Times in 80 Posts
    you could read the meta keywords from the page instead of just dumping the header most sites that have been seo'd have some form of keywords in them

    also a bulk email out to all your publishers asking them to categorise their site and in the mean time they've been dumped into a generic category would solve that issue
    A programmer is just a tool which converts caffeine into code

    My work: http://www.fcsoftware.co.uk && http://www.firstcontactcrm.com
    My hobby: http://www.angel-computers.co.uk
    My life: http://www.furious-angels.com

  • #9
    Regular Coder the-dream's Avatar
    Join Date
    Mar 2007
    Location
    Northamptonshire, UK
    Posts
    477
    Thanks
    8
    Thanked 4 Times in 4 Posts
    Yeah, I could read meta keywords, but there is an issue...

    For example, there is a site in our network about cars, so if I analyzed the meta keywords for the home page, I would get general ads about cars.

    But say he puts a post about car restoration on a page of his site, If I analyze the meta keywords of that page, then If they're the same as the homepage, then I will just get generic info about cars, rather than if I did analysis of the content, I would get a keyword set based on car restoration.

    Do you see how this will give a different result set than analyzing the content? A less targeted result set...

    --

    I suppose I could bulk email everyone in the network, but, then again, there is no guarantee that they'll all reply, and again, it does some of the work, but then just generic ads are found, not ones that are targeted to the pages content (not just the websites content).

  • #10
    Regular Coder hinch's Avatar
    Join Date
    Sep 2005
    Location
    UK
    Posts
    923
    Thanks
    25
    Thanked 80 Times in 80 Posts
    i didn't mean use the meta's as the only method i mean't combine your current method of content weighting with meta keywords so you have the more generic meta's for general ad selecting then do a quick content parse on each page as its loaded to then narrow down the result set further.

    also remember most forum and blog software creates different meta's per page dependant on the content and any "tags" set/selected
    A programmer is just a tool which converts caffeine into code

    My work: http://www.fcsoftware.co.uk && http://www.firstcontactcrm.com
    My hobby: http://www.angel-computers.co.uk
    My life: http://www.furious-angels.com

  • #11
    Regular Coder Zangeel's Avatar
    Join Date
    Oct 2007
    Location
    public_html/
    Posts
    638
    Thanks
    17
    Thanked 79 Times in 79 Posts
    Quote Originally Posted by the-dream View Post
    Looks like a pretty painful thing to read... :/

    But, I'll be sure to chug on with it, (may take a while). In the mean time, any small (or big) changes I can make to my current script to improve accuracy?
    It made my eyes bleed

    But what do you reckon are weakpoints in your function?
    PHP Code:
    $aString is_string((string)array()) ? true false// true :D 
    [/CENTER]

  • #12
    New to the CF scene
    Join Date
    Oct 2009
    Location
    Perhaps on the Internet...
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I fixed a couple of bugs in your script for you.

    PHP Code:
    // GET THE 'TARGET' PAGE SOURCE AND PUT INTO THE $source VARIABLE.
    $source = @file_get_contents($url) or die ("Cound not connect to ".$url); 
    This is needed or in case of a not working URL the script will return an error.


    PHP Code:
    if ($keyrank) {

    // SORT THE KEYWORDS ARRAY SO THE HIGHEST RANKED WORDS ARE AT THE TOP
    arsort($keyrankSORT_NUMERIC);

    } else {

    echo 
    "<br>No keywords were found on your given URL.<br>";
    echo 
    "<br>Your url: ".$url."<br>";
    die;

    }
    // CREATE AN ARRAY OF RESULTS 
    This is also needed, or in case of URLs with no keywords, the script will return an error. You could also use the URL with no keywords by adding it to, for example, a MySQL database, and to set your advertising interface not to display sponsored results for an URL with no keywords, or just default or RON ads, and to make a cronjob that check no-keyworded URLs every day or so to see if keywords are now availlable.

    Another improvement could be made by checking the <meta name="keywords">, "description" and "title" for keywords, and adding them to the keyword list but with more weight. For example, you can make every keyword found in the title, description or keywords, worth like 2, 3 or more occurrencies, what you deem more accurate.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •