View Full Version : What are the techniques available to search a database?
Hi:
Can someone provide a list of the most common techniques used to search a database for matches based on user-entered keywords etc?
At this point the only SQL syntax I'm aware of to search a db is using the LIKE clause:
SELECT article FROM mytable WHERE article LIKE "php variable";
Is this an efficient enough way, even for a large table? For example, what kind of algorithm do commercial scripts like VB or blog use to search its content?
Thanks,
The speed of the database search is determined primarily by the database engine, not by the application used to access the database. There are also a bunch of other factors to consider. Probably the most important one is whether or not the field you are matching is indexed within the database. If it is, your query is going to be faster.
The SELECT ... statement is the correct (and only real) way to search a database (there are other operators aside from 'LIKE' -- the usual suspects of =, >=, <=, <> etc. These are all faster if you're searching non-text fields.
To answer your question, you would need to look at the specific benchmarks for whatever database you would need to know about. I know, for example, mySql is very fast for simple selects (SELECT * FROM table WHERE criteria = pattern). It is not as efficient with complex queries (the sort of stuff you would build a view for in Oracle or MS SQL) and for inserts and the like.
Where it becomes a more complex question is with joined queries. When you are combining data from more than one table, then you do need to look at execution speed of the application code, as well as the database speed (is it quicker to run two simple queries and combine the results using php or to use one complex query directly on the database -- again, I don't know the answers, but benchmarks are the only way to find out).
A view, which I just mentioned, is sort of a permanent query. It is a a mixture of a query and a table (it is used as if it's a table, but it is based, like a query, on one or more underlying tables). This is very common for frequent, complex enquiries in large databases (in my old job one thing I ended up working on was specifying the data that was required for the financial reports out of the complex finacial tables -- about 40 tables with up to 250,000 records in the big ones. These specs were turned into views and the views are searched to build the reports). Again, the database engine does the work to produce the views, but it's not part of the processing time for querying the view.
To answer your question in short:
1. It depends on the database engine.
2. It depends on the data definition (eg indexing, views).
3. It depends on the nature of the search (LIKE vs =).
4. It depends on the complexity of the queries.
5. After all of that, it depends on the application.
To give a general answer is not that helpful. You need to know some of these specifics.
Thanks Kiwi. That was a very interesting and thought-provoking response. To elaborate on my question, the database would be mySQL. Apart from LIKE, I was wondering if there are any other ways for searching a table, and their efficiency in searching a lot of text. For example, how does vBulletin or phpNuke construct the search queries when a user attempts to search for a particular text? Somehow just using LIKE %usersearchterm% seems too good to be true.
I noticed mysql also supports "full text" search: http://www.mysql.com/doc/en/Fulltext_Search.html Does anyone have any info on it, such as when best to use the technique?
Thanks,
Here are the mySql benchmarks (http://www.mysql.com/information/benchmarks.html) (published by mySql).
HEre's a thread talking about like clauses in mySql (http://www.geocrawler.com/mail/msg.php3?msg_id=55799&list=8) -- it has some useful information in it. The key is if you're searching at the start of the field, indexing like will make it run fast (but use up a lot of disk-space). If you're searching anywhere in the string, then LIKE will be slow (and so will almost anything else, unfortunately).
Using regex matches is another option, but won't be much faster (if I remember rightly, most SQL converts a LIKE comparator to regex itself).
(Just a note: mySql is a very fast database -- probably the fastest on the market. Most of this speed is achieved by not including three sets of database features: foreign keys, transactions and stored procedures. For any application that doesn't require these features or an advanced security model, then mySql is an excellent choice.)
vBulletin® v3.8.2, Copyright ©2000-2009, Jelsoft Enterprises Ltd.