What is Solr?
Apache Solr is a fast open-source Java search server. Solr enables you to easily create search
engines which searches websites, databases and files.
Solr is the popular, blazing fast open source enterprise search platform from the Apache
Lucene project. Solr is powered by Lucene, a powerful open-source full-text search library,
under the hood.
• Doug Cutting created Lucene in 1999.Recognized as a top level Apache Software
Foundation project in 2005
• Yonik Seeley created Solr in 2004.Recognized as a top level Apache Software
Foundation project in 2007
Its major features include powerful full-text search, hit highlighting, faceted search,
dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and
geospatial search.
Solr is highly scalable, providing distributed search and index replication, and it powers
the search and navigation features of many of the world's largest internet sites.
Solr is written in Java and runs as a standalone full-text search server within a servlet
container such as Jetty.
Solr uses the Lucene Java search library at its core for full-text indexing and search, and
has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming
language.
Solr's powerful external configuration allows it to be tailored to almost any type of
application without Java coding, and it has an extensive plugin architecture when more
advanced customization is required.
Solr makes it easy to add the capability to search through the online store through the following
steps:
Define a schema. The schema tells Solr about the contents of documents it will be
indexing. In the online store example, the schema would define fields for the product
name, description, price, manufacturer, and so on. Solr's schema is powerful and
flexible and allows you to tailor Solr's behavior to your application. See Documents,
Fields, and Schema Design for all the details.
Deploy Solr to your application server.
Feed Solr the document for which your users will search.
Expose search functionality in your application.
Solr is able to achieve fast search responses because, instead of searching the text
directly, it searches an index instead.
This is like retrieving pages in a book related to a keyword by scanning the index at the back of
a book, as opposed to searching every word of every page of the book.
This type of index is called an inverted index, because it inverts a page-centric data structure
(page->words) to a keyword-centric data structure (word->pages).
Solr stores this index in a directory called index in the data directory.
How Solr represents data
In Solr, a Document is the unit of search
An index consists of one or more Documents, and a Document consists of one or more Fields.
In database terminology, a Document corresponds to a table row, and a Field
corresponds to a table column.
When data is added to Solr, it goes through a series of transformations before being added to
the index. This is called the analysis phase. Examples of transformations include lower-casing,
removing word stems etc. The end result of the analysis are a series of tokens which are then
added to the index. Tokens, not the original text, are what are searched when you perform a
search query.
indexed fields are fields which undergo an analysis phase, and are added to the index.If a field
is not indexed, it cannot be searched on.
Solr Features
Keyword Searching – queries of terms and boolean operators
Ranked Retrieval – sorted by relevancy score (descending order)
Snippet Highlighting – matching terms emphasized in results
Faceting – ability to apply filter queries based on matching fields
Paging Navigation – limits fetch sizes to improve performance
Result Sorting – sort the documents based on field values
Spelling Correction – suggest corrected spelling of query terms
Synonyms – expand queries based on configurable definition list
Auto-Suggestions – present list of possible query terms
More Like This – identifies other documents that are similar to one in a
result set
Geo-Spatial Search – locate and sort documents by distance
Scalability – ability to break a large index into multiple shards and
distribute indexing and query operations across a cluster of nodes
A complete Architecture
Apache Solr is a fast open-source Java search server. Solr enables you to easily create search
engines which searches websites, databases and files.
Solr is the popular, blazing fast open source enterprise search platform from the Apache
Lucene project. Solr is powered by Lucene, a powerful open-source full-text search library,
under the hood.
• Doug Cutting created Lucene in 1999.Recognized as a top level Apache Software
Foundation project in 2005
• Yonik Seeley created Solr in 2004.Recognized as a top level Apache Software
Foundation project in 2007
Its major features include powerful full-text search, hit highlighting, faceted search,
dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and
geospatial search.
Solr is highly scalable, providing distributed search and index replication, and it powers
the search and navigation features of many of the world's largest internet sites.
Solr is written in Java and runs as a standalone full-text search server within a servlet
container such as Jetty.
Solr uses the Lucene Java search library at its core for full-text indexing and search, and
has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming
language.
Solr's powerful external configuration allows it to be tailored to almost any type of
application without Java coding, and it has an extensive plugin architecture when more
advanced customization is required.
Solr makes it easy to add the capability to search through the online store through the following
steps:
Define a schema. The schema tells Solr about the contents of documents it will be
indexing. In the online store example, the schema would define fields for the product
name, description, price, manufacturer, and so on. Solr's schema is powerful and
flexible and allows you to tailor Solr's behavior to your application. See Documents,
Fields, and Schema Design for all the details.
Deploy Solr to your application server.
Feed Solr the document for which your users will search.
Expose search functionality in your application.
Solr is able to achieve fast search responses because, instead of searching the text
directly, it searches an index instead.
This is like retrieving pages in a book related to a keyword by scanning the index at the back of
a book, as opposed to searching every word of every page of the book.
This type of index is called an inverted index, because it inverts a page-centric data structure
(page->words) to a keyword-centric data structure (word->pages).
Solr stores this index in a directory called index in the data directory.
How Solr represents data
In Solr, a Document is the unit of search
An index consists of one or more Documents, and a Document consists of one or more Fields.
In database terminology, a Document corresponds to a table row, and a Field
corresponds to a table column.
When data is added to Solr, it goes through a series of transformations before being added to
the index. This is called the analysis phase. Examples of transformations include lower-casing,
removing word stems etc. The end result of the analysis are a series of tokens which are then
added to the index. Tokens, not the original text, are what are searched when you perform a
search query.
indexed fields are fields which undergo an analysis phase, and are added to the index.If a field
is not indexed, it cannot be searched on.
Solr Features
Keyword Searching – queries of terms and boolean operators
Ranked Retrieval – sorted by relevancy score (descending order)
Snippet Highlighting – matching terms emphasized in results
Faceting – ability to apply filter queries based on matching fields
Paging Navigation – limits fetch sizes to improve performance
Result Sorting – sort the documents based on field values
Spelling Correction – suggest corrected spelling of query terms
Synonyms – expand queries based on configurable definition list
Auto-Suggestions – present list of possible query terms
More Like This – identifies other documents that are similar to one in a
result set
Geo-Spatial Search – locate and sort documents by distance
Scalability – ability to break a large index into multiple shards and
distribute indexing and query operations across a cluster of nodes
A complete Architecture
Indexing Process
No comments:
Post a Comment