My new project is to build a web search engine with a web spider, and I'm thinking of using three languages, namely, Python, Java and C++. Now I'm somewhat confused about which programming language is well-suited for creating a web crawler, content indexer, ranking algorithm and searching mechanism.
I fully agree that some programming languages deliver optimal performance for certain tasks, and lag behind in other areas. So, we want to make the right choices. A friend of my suggested that I use C++ to develop features that demand ultimate speed and Python for glue code that is not very time-critical. But I'm not yet too sure of the exact features that will require absolute speed, so you may want to enlighten me.
Now my questions are:- Where should Python come in? Which features should it be used for?
- Which language (C++ or Java) is most suitable for developing a web crawler and why?
- Which language is best suited for developing a search ranking algorithm - C++ or Java?
- Which features of the search engine should C++ be used for?
- Which features should Java be used for?
- Do these three languages make a good combination when developing a search application?
- Which database management system will be excellent for this type of application? Will MySQL be reliable or is there a higher-level database system that will be most suitable?
Please, enlighten me on the above-mentioned points, so that I'll be more equipped to get down to work. Any positive response and suggestion will be highly appreciated.