Apache Solr Introduction and Server Setup



Introduction to Apache Solr

Apache Solr is a search server built as a web-application using Java language. Its not directly used by end-users to perform search operations on organizational data but by software developers (called Search Engineers) to integrate their softwares or applications with Apache Solr to use Solr search features. That means, Solr interface (way to interact with Solr) is not built for humans but for programs. Developers first submit their data to Solr using Solr RESTful API, the data is indexed by Solr. Once the data is indexed, developers send search queries with different filters (to refine search) to Solr server using Solr APIs. Solr returns the results in JSON or XML fromat that is parsed and displayed to end-users.

Apache Solr Admin Interface

Apache Solr provide Admin Interface but its target users are not end-users because it requires technical knowledge to use. Admin Interface is used by Search Engineers to test search queries, modify schemas and to perform other search server administration related operations. Admin Interface is part of Apache Solr software but it talks to server using same Solr API as used by user apps.

How to Interact with Apache Solr

Solr provides JSON/XML based interface, that means, if we want to talk to Apache Solr, we must compose our search queries as JSON or XML before sending to Solr. When these queries are sent to Solr, Solr interpret these queries and respond in JSON/XML format. The response data depends on the type of query we sent. Common queries are to index data, update index, change schema and perform search operations applying different filters. Set of all interfaces of Solr that a program can invoke are called Solr API (Application Programmable Interface). So, Solr API is JSON/XML based end point that programs use through HTTP, because Solr is a web application, actually a web service (because programs interact with it, not the humans).

Which Type of Softwares can use Solr for thier Search Module

Apach Solr API is independent of any programming language or platform. Solr have no concern in which programming language you composed the JSON or XML. Actually Solr even don't know the language we are using to draft JSON/XML messages. That means, it do not matter in which programming language your software is written, it can talk to Solr because Solr API is JSON/XML based. Almost all modern programming languages support parsing and drafting JSON/XML format messages.

The softwares in which you can integrate Apache Solr to develop search module, can be a web application, a mobile application or even a desktop application. All you need to do is, create JSON/XML messages and send it to Solr over using HTTP. Why HTTP? because Solr runs as web-application. Below diagram shows Server, Solr API, User Apps and Admin UI positons.



Why Apache Solr rather a Relational Database

Relational Database Management Systems (RDBMS) are not optimized for search but for data integrity. RDBMS do provide basic search operations e.g. select using where, grouping, joins, or full text indexing at max, etc. but users expect more than that. Below I list very common features users expect but databases do not provide (off-the-shelf):

  1. Auto Suggest: When you start typing search query in Google and it automatically shows you related search
  2. Keyword Highlighting: System shall highlight the searched keywords so that user can easily filter the required data
  3. More Like This: On each post of a blog, if you want to show related posts. Or on product details page, you want to show related products that matches with currently selected product.
  4. Most Relevent Records First: If user enter multiple keywords to search. User not only expect results that contains those keywords but the posts, products or articles that contain most of those keywords, should come at top.
  5. Spelling Errors Correction: If user made minor error in spelling of keywords. Databases are unable to find the closely related keywords. But Solr do provide mechanism to detect errors in spellings so that users could see some result rather "Record not found" message.
  6. Synonyms Search: User may not always know the exact keywords used in the document, product or a post. If user enter a synonym word, relational database would not find the results. But Apache Solr provide feature to integrate synonyms list so that if user type closely related words, system would still show the results.
  7. Priortize Fields - Boost Fields: What if you want to filter results based on two fields but you want to prefer one field to other. Or you want to quantify by how much factor field one is important than field two. Solr allows you to boost fields with required factor. Relation database do not have such options.
  8. Geo Spatial Search: For example, if you want to find restourants that are nearest to user current location. Of if you want to find real estate properties which are near to customer preferred location. Apache Solr allows you to perform such queries, without getting you into mathematics of distance calculations.
Definitely there are many other features but I hope above list is good enough to get the basic idea when Apache Solr should be your first preference over writing search solution on RDBMS. Next I would explain how to install Apache Solr and run it. (Indexing data and running queries would be explained in separate article).

Solr Installation

Download and unzip the compressed file of latest Apache Solr version, I am using Solr 6.4.2. Its ready to run. Make sure, you have Java Runtime Environment (JRE) installed on your computer already and your CLI identified java command. After you unzip it, its contents would look like:



Start Solr

Using Command Line Interface, go to bin directory and enter following command:

D:\solr-6.4.2\bin>solr start

You would see following message, thats shows Solr instance is started:

Waiting up to 30 to see Solr running on port 8983
Started Solr server on port 8983. Happy searching!

Solr is running at port number 8983. To open Admin Interface, point your web browser at:

http://localhost:8983/

Following page should appear in your browser.



In next post, I would explain how to create core, index some sample data and run basic search queries using Admin UI. 

Comments