The Sphider search engine is a nice, compact program you can add to your site to provide a search function without having to go through Google or some other paid service. It offers all the features you’d expect from a search engine, including weighted searches, customizable output, and an easy-to-use administrator panel. The one thing it lacks, however, is the ability to search substrings.

Let’s say, for instance, you have a page on your site talking about an office you have in Europe. Searching for “Europe” comes up with the page without any problems, but if you type in “Euro”, you get nothing. For people used to using Google, this is unacceptable, because it tells the user that there are no pages with any version of “Euro”, including “European”.

The reason for this is twofold:

1. Sphider uses a series of tables in a database to organize and weight all the words on each page. To make the search faster, rather than one large database for each word, they are split into 17 databases, based on the leading character of the word when the MD5 hash of the string is calculated. For those of you unfamiliar with MD5, here is the official definition.

From RFC 1321 – The MD5 Message-Digest Algorithm:

The MD5 message-digest algorithm takes as input a message of arbitrary length and produces as output a 128-bit “fingerprint” or “message digest” of the input. The MD5 algorithm is intended for digital signature applications, where a large file must be “compressed” in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA.

Just think of it as a function which takes a word and spits out a consistent string of gibberish. For Sphider’s purposes, the first digit of that string is the number of the table of words to look through. Since there are 17 possible values, that means that the search only has to look through one table, which is 1/17 of the total number of words. In essence, the search is faster.

The problem comes in when a term like “Euro’ puts out a different MD5 than “European’, which means that Sphider will try to look in a different table. Because of this, a search for “Euro’ will never come up with “European’.

2. The second reason involves the query that Sphider sends to the database. Here it is:

$query1 = "SELECT link_id from ".$mysql_table_prefix."link_keyword$wordmd5, ".$mysql_table_prefix."keywords where ".$mysql_table_prefix."link_keyword$wordmd5.keyword_id= ".$mysql_table_prefix."keywords.keyword_id and keyword='$searchword'";

The $wordmd5 portion is the MD5 code calculated earlier, which tells which table to search for that particular word. What you’ll notice, though, is the portion for keyword=’$searchword’. Since the search term and word in our example aren’t equal, this will never turn up a correct result. What we need to do is alter the equation to not only find things that are exactly equal, but are like each other. The new portion of the code we end up with is:

keyword LIKE '%".$searchword."%'

The LIKE operator is MySQL code for a simple pattern matching operation, and the % characters indicate that there can be any number of other characters before and after the search word. So, “asdfasdfeuro”, “euroasfasdf”, and “asdfasdfeuroasdf;alk” would all be acceptible.

Now that the basic search query is altered to match variations of the search term, and not just the search term exactly, we have to alter it further so that it searches every table of keywords, because those other words which include the keyword as a substring might have a different MD5 code, and thus be in a different table.

This is where the hack gets ugly.

Yes, there are longer ways of organizing things that would provide a more elegant solution, but the quickest, simplest solution is to simply make a query for each table, and combine them all with the UNION MySQL term into one big query. Hold your breath:

$query1 = "(SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword0, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyword0.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword1, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyword1.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword2, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyword2.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword3, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyword3.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword4, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyword4.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword5, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyword5.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword6, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyword6.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword7, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyword7.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword8, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyword8.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword9, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyword9.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyworda, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyworda.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keywordb, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keywordb.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keywordc, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keywordc.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keywordd, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keywordd.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyworde, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keyworde.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"UNION (SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keywordf, ".$mysql_table_prefix."keywords WHERE ".$mysql_table_prefix."link_keywordf.keyword_id = ".$mysql_table_prefix."keywords.keyword_id and keyword LIKE '%".$searchword."%')".
"order by weight";

What this does is disregard the MD5 code for the search term, and search every table anyways…even the ones Sphider thinks shouldn’t have the search term in it. While this does theoretically slow the search down, if the site isn’t hundreds of pages long, the search time won’t be very much longer. What is does, is turn up results for “European” when someone searches for “Euro”, which is well worth the sacrifice of a few tenths of a second in search time.

 

Sphere: Related Content