Sphinx full-text search
Sphinx is a full-text search engine, distributed under GPL version 2. Commercial license is also available for embedded use.
Generally, it’s a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe mechanism (a pipe to indexer in special XML-based format which Sphinx recognizes).
As for the name, Sphinx is an acronym which is officially decoded as SQL Phrase Index. Yes, I know about CMU’s Sphinx project.
Key features
- high indexing speed (upto 10 MB/sec on modern CPUs)
- high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
- high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
- supports distributed searching (since v.0.9.6)
- supports MySQL natively (MyISAM and InnoDB tables are both supported)
- supports phrase searching
- supports phrase proximity ranking, providing good relevance
- supports English and Russian stemming
- supports any number of document fields (weights can be changed on the fly)
- supports document groups
- supports stopwords
- supports different search modes (”match all”, “match phrase” and “match any” as of v.0.9.5)
- generic XML interface which greatly simplifies custom integration
- pure-PHP (ie. NO module compiling etc) search client API
Distribution
Sphinx distribution includes the following programs:
- indexer: an utility to create fulltext indices;
- search: a simple (test) utility to query fulltext indices from command line;
- searchd: a daemon to search through fulltext indices from external software (Web scripts using Sphinx API; or MySQL with SphinxSE; or your application server);
- sphinxapi: a set of API libraries for popular Web scripting languages (there are native API ports for PHP, Python, Java, Perl, and Ruby).
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.



Leave a Reply