Articles > Information Security
How Secure Is Your Enterprise Search Engine?
By Dave Phillips, head of search engine technology at Corpora
Monday, 8 August 2005 10:57 ESTComputer crime now affects 90% of UK businesses, and costs the country a total of £2.4 billion per year. Enterprise search engines play an increasingly pivotal role in today's organisations. However, most IT managers are not aware that the majority of search engines installed more than three years ago were not designed with security in mind.
In the wrong hands, these search engines could be used to expose a company's most confidential information to criminal misuse.
When search engines launched in the late 1980s, security was not considered a primary design objective. The designers were determined to create engines that could find and retrieve text documents from across the company repository and the Internet as efficiently as possible. Security was not a key consideration at this early stage.
Towards the end of the 1990s, organisations slowly became aware of the potential severity of search engine security breaches. They began to demand more secure solutions from search engine manufacturers. The need for improved security was acknowledged by the vendor community, and some changes were made to address the risks. However, in most cases all this resulted in was a "papering over of the cracks". In their race to push forward on accuracy and performance, none of the search engine providers invested the time required to go back and completely re-build their platforms to properly incorporate security into the core of their design. This mistake, borne of a highly competitive market and resource constraints, will be regretted by many.
Today a very large proportion of UK businesses, from SMEs to international enterprises, are operating search engines that are effectively insecure. I would estimate that in the region of 80-90% of search engine installations in the UK are not secure enough to protect their organisations from a determined criminal - whether from inside or outside the business.
Companies spend millions of pounds creating, building and maintaining their information repositories. They store enormous volumes of highly valuable and confidential corporate and customer information. In many instances it would be relatively easy for a hacker, or even a technically aware user, to take control of the enterprise search engine. They could then steal, alter, copy or corrupt the information contained in the repository - compromising both the investment made in the information, and the security of the organization and its customers.
If your search engine was installed more than a few years ago, then you may be laying down a red carpet for hackers and information thieves to enter your information repository. The following are the design flaws that you should look out for in older search engines:
• Administrator access via a web browser.
Until very recently, search engines tended to be designed to allow administrative access via web browsers. Savvy information thieves can easily use this route to take control of you search engine because the commands are constant for each brand of engine. For efficiencies sake the search engines manufacturers used the same standard set of commands across all their installations and anyone, whether inside or outside your organization, who has had administrator rights to similar search engines in the past will have the ability to use those commands to take control of your search engine via their web browser today. By offering direct http access to the search engine you are leaving the side door to your information repository unlocked - inviting unwelcome visitors in.
• Lack of security checks between the engine and front end applications.
Illicit visitors have another option for gaining access to older search engines, they can pose as a valid front-end application. Most search engines that are more than a couple of years old provide absolutely no mechanism by which the search engine can validate the identity of a front end application from which it has received a command or to which it is sending results. This lack of identity checking makes it simple for anyone with the required level of skill to create an application that poses as the valid front-end user interface, and then takes control of the search engine and the information repository. Even if your search engine vendor has plugged the web access hole, the lack of security handshakes to confirm authenticity of front end applications, can still be enough to allow a security breach.
• Lack of indexing engine validation
Another potential route for intrusion into your search engine comes in the form of bogus indexers. A lack of secure handshakes between older engines and their indexing systems leaves an opportunity open for intruders with the knowledge to create rogue indexers that pose as authentic indexers, in order to establish a connection with the search engine. These indexers then have the power to pollute the engine and the repository with false information - compromising the integrity of the index and potentially ruining the value of the repository. Search engines should employ secure handshakes in every instance that they communicate with applications, be they user interface or indexing applications, to avoid these security breaches.
• Lack of document access control
The key security flaw of older enterprise search systems is that they do not control the access that the search engine has to documents within the repository. They grant the engine unlimited access at all times to all of the repository. The engine in-turn returns 100% of the possible results to the front-end application that requested them, at which point user access rights to each document are checked. There are serious flaws in this approach. Firstly, regardless of access rights the user gets to see the titles of every document. This is problematic because some titles e.g. "Merger proposal with DC Labs - Confidential!" can reveal the key elements of their contents. Secondly, this approach gives access to the entire repository to anyone who illicitly gains control of the front end application. Search engines are far more secure when the search engine can only return those documents to the front-end application which the user has the right to see.
In conclusion, information security risks threaten many of today's organisations. Companies should insist that their enterprise search engine has been designed from the ground up to provide the highest levels of security possible. Far too many solutions have had gaping security cracks which have been papered over in the interests of haste. Organisations cannot afford to be taking risks with their confidential information or that of their customers.