Implementing MySQL full-text search

There comes a time in every developer's life when they are required to implement some sort of a search functionality. Most of us start with the easiest choice: a MySQL "like" query.

So, if someone searched for the term "I like bananas", the query probably looked something like this.



SELECT * FROM posts WHERE title LIKE '%I like bananas%'


That'll be alright for the most part, but wait. That's only good as long as you are looking for posts with titles encompassing that EXACT phrase. In other words, if there's a post with a title that contained "I like ripe bananas", it wouldn't get queried. Hmm...

Full-text search for the rescue!

So, what's a full-text search? It's a type of search that actually looks at each search keyword separately, and tries to match combinations of those keywords, as opposed to the exact phrase the user entered. In other words, it'll solve our problem above. (If you would like to read up more about full-text searching, check out this wikipedia article)

Alright, so what do we do next?

Let's say that the table that stores the posts is called "posts" and that you would like to run the search on the "title" (VARCHAR 255) and "description" (TEXT) fields of that table. (Note: If you have a field in BLOB format, you should first convert it to TEXT before proceeding)

  1. First, we need to add a fulltext index to the columns we would like to run the search on. In our example, you can do that by running this query

    ALTER TABLE posts ADD FULLTEXT(title, description);
    
  2. Now that you have the fields indexed, add a few rows of data.
  3. Once you have around 5 - 6 rows (read the "Gotchas" section below for why this is important), run the following query

    SELECT * FROM posts WHERE MATCH(title, description) AGAINST ('I love bananas')
    
     

Provided that you have the word "love" or "bananas" in any of the post titles or descriptions, this should pull them up.

It's that easy!

But, there are some very serious gotchas to be aware of:

  1. If your search term is too short, the search will NOT return anything. (I lost around 3 hours on this one). It turns out that MySQL has a lower limit on how many characters you need to enter in the search term before it even bothers doing a full-text search. There are 2 solutions to this problem. You can either tell the users via instructions (and/or using validations), that they need to enter at least x number of characters to perform a search. Or, you can revert to a "MySQL like" search when the search term is too small. Both are acceptable solutions depending on your particular scenario.

    To find out what this full-text search lower limit is, run the following query and look at the "ft_min_word_len" value.

    show variables like 'ft_%'
    
  2. If the number of records you have is too few, MySQL will once again return an empty result set (Aargh... another 2 hours of my life down the drain!). So, make sure you have a decent number of rows first before running a full-text search query. I couldn't find any good documentation that pointed this little nugget out. I hope this post save someone else a few hours of their time.
  3. The way MySQL orders the results might be a bit weird in some cases. MySQL makes this decision based on a score that it internally assigns each result. You can print it out by doing the following query

    SELECT *, MATCH(title, description) AGAINST ('I love bananas') AS score FROM posts WHERE MATCH(title, description) AGAINST('I love bananas')
    

    Basically, if the search term appears in ALL the rows, then MySQL has to take a call on what rows are more important than others. (Getting this one thing right is what makes Google such a great search engine!). Otherwise, it'll try to give a higher score to rows that contain more of the keywords than ones that don't. The best way to figure this out is by trying a few searches and analysing the scoring mechanism.

I wanted to keep this post simple, and not dvelve into things like binary and natural language searches. The idea was to point out a few crucial "gotchas" and start you off on the right path. You can find more information about full-text search at http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html.>

Filed under  //   Full-text Search   Gotchas   MySQL   MySQL Full-text search   Search  

Generating unique referrer codes in PHP

One of the things I had to do today was to come up with a way to generate a unique referrer code for each user on a site, so that they can use that to invite more friends to join the site. Naturally, each of these referrer codes HAVE to be unique for this to work (since we also have a reward scheme in place where the referrer gets a monetary compensation for each signup).

It may seem like doing something like md5($username) would be a good approach. But while md5 is an easy hashing solution, it's not unique. There's a very real possibility that 2 completely different usernames could result in the exact same md5 hash.

After fiddling around on php.net for a while, I came across a function called uniqid(), which supposedly generates a unique GUID that you can use. However, deeper inspection of the function showed that it is doing this based on the system time. Now, this sounds pretty unique, but with something like a referrer code, anything short of absolutely unique won't work. We do have to take into account the case where 2 users register at the exact same millisecond, which would result in both of them getting the same GUID as their referrer key.

Fortunately, this function also takes a prefix as an optional parameter to prepend to the generated GUID. Now, this is also tricky. Since the GUIDs are not all the same length, when you add a prefix to it, you have once again created the possibility for duplication. Take the following simplified example. (The bolded letters are part of the GUID, whereas the unbolded letters form the prefix)

abcdefgh
abcdefgh

As you can see, the results are not guaranteed to be unique.

The solution is to ensure that the prefix is terminated with a character that is not in the set of characters used to generate the GUID. Further inspection of the uniqid() function showed that the return values will never contain an underscore character, which meant that as long as the prefix ends with a "_", then the resulting combined code MUST be unique.

So far so good. But what can we use as a prefix which uniquely identifies a given user. The best candidate seemed to be the primary key in the users table. In order to obfuscate things a bit further, I also took the additional step of converting this id to hex, and then using that as the prefix (with an underscore added to the end).

The result of all of this was:

$referrer_code = uniqid(dechex($user_id).'_');

Simple, and unique :). I keep reading articles regarding how "practically unique" is good enough, as opposed to "guaranteed unique" (http://en.wikipedia.org/wiki/Universally_unique_identifier). But when it involves actual money changing hands, can we truly take that chance? I believe that what I've come up with does generate a truly unique referrer code for each user. (If you don't agree, please post a comment. I appreciate all types of feedback :) ). Also, if you can suggest easier / better ways to achieve the same, I'd love to hear about them!

Hope this helps someone out there...

Filed under  //   GUID   hash   md5   php   referrer  

Using regular expressions in Codeigniter database queries

A typical database "select query" in Codeigniter using Active Record can look like the following

$this->db->where('product_name', 'iPhone');
$query = $this->db->get('products');

This is great, as long as you know EXACTLY the name of the product you are searching for. But, what if you just want to see whether the product name starts with a certain word, ends with a certain word, or contains a certain word? For that, you can use the following

$this->db->like('product_name', 'Apple', 'before'); // Selects all products starting with the word "Apple"
$this->db->like('product_name', 'iPhone', 'after'); // Selects all products ending with the word "iPhone"
$this->db->like('product_name', 'phone', 'both'); // Selects all products containing the word "phone"

This covers most of the situations you'd run into while doing select queries. However, occasionally, you need to do queries that involve much more complex select criteria. And the only way to accommodate for these is by using regular expressions.

I know a lot of you cringe at the thought of regular expressions. However, they can really help you out of tough spots, and are well worth learning. A good starting point is http://www.regular-expressions.info/quickstart.html

Let's say you wanted to select products that have names starting with a number. Using regular expressions, you can write the following Active Record query.

$this->db->where('product_name RLIKE', "'^[0-9].*'", FALSE);
return $this->db->get('products');

With this, you can utilise the full power of regular expressions in your database queries to select records that follow specific and complex patterns.

 

Filed under  //   Active Record   Database   Query   Regular Expressions   codeigniter  

Handling UTF-8 with Codeigniter

For one of the projects I'm working on, I had to write a script that fetches RSS feeds from various sources and stores them in a database periodically. My initial thought was to use SimplePie, and write a PHP script which then gets called repeatedly via CRON. However, given the number of sources I'd have to pull posts from etc, I decided instead to go with Python. The idea was to write the feed parsing bit in Python and then to display these posts using Codeigniter (since that's what the full application was written in)

So, I wrote a quick python script with Universal Feed Parser, which worked great. But, I started noticing that some of the sources I was pulling data from had a lot of UTF-8 characters.

Now, Python is brilliant at handling UTF-8. But PHP... not so much (which hopefully would change with PHP6). I hadn't run into UTF-8 related issues with any of the applications I've developed till now. So this was a bit of a curve ball.

The way to tackle this involves a few steps

  1. Install the "mbstring" PHP extension
  2. Configure your php.ini file to use the extension (add "php_mbstring.dll" or "php_mbstring.so" in the extensions area depending on the platform)
  3. Change the database and each of the tables over to UTF-8 (Under MySQL, you can use "utf8_general_ci")
  4. Modify various helpers in Codeigniter to take into account UTF-8 (more details in the liks below)
  5. Optionally, write your own helper to convert UTF-8 characters to regular single byte characters

This sounded like a pretty daunting task at first, but the following blog posts helped me out immensly. Everything listed above is clearly explained in the posts below. Also, pay close attention to the comments under those posts as well, because they have fixes for some "gotchas" the original posters had missed.

 

Filed under  //   CRON   PHP6   Python   RSS   SimplePie   UTF-8   Universal Feed Parser   codeigniter   mbstring  

Customising Error pages on Codeigniter

CodeIgniter comes with a number of very basic error templates that you can use right off the bat. These are stored in the  "application/errors" folder under these file names

  • error_404.php - This is used to display the standard 404 error message
  • error_db.php - For database related error messages
  • error_general.php - For general error messages
  • error_php.php - A wrapper for any PHP error messages that are thrown

In order to throw an error, you can do the following in a controller function.

show_error('Your error message');

This uses the error_general.php template to render your error message

Instead of just throwing an error under one function in the controller, you can also apply this to the whole controller by putting it in the constructor, (In this example, 'auth_model' is a part of an auth library, and the is_admin() function checks for user permissions of the logged in user)

function Site()

  {

    parent::Controller();

    $this->load->model('auth_model');

    if (!$this->auth_model->is_admin())

    {

        show_error('You do not have permissions to access this page');

    }

  }

This will prevent any user without the right privileges from accessing any of the functions of that controller.

As you can see though, these templates are extremely basic and very Codeigniter-looking, which probably isn't going to cut it for your web application.

So here's how you can make it fit your overall application better.

I am assuming that you use partials to divide up your view templates as follows

<?$this->load->view('header')?>

<p> Site content goes here </p>

<?$this->load->view('footer')?>

where header.php and footer.php hold your header and footer code.

In the error pages, (e.g. error_general.php) you can do the following

<?$this->ci =& get_instance()?>

<?$this->ci->load->view('header')?>

    <h2 id="page-heading"><?=$heading?></h2>

    <?=$message?>

<?$this->ci->load->view('footer')?>

Note: You have to use the line $this->ci =& get_instance() to get the current Codeigniter instance in order to load the views.

By doing it this way, since your styles are included in the header.php, this will now make your error messages look just like another page in your application.

For further reference, please check out the following resources

Filed under  //   codeigniter   errors   template  

Unit Testing for Codeigniter

If you are dealing with OOP (Object Oriented Programming), and if you are working on a large project that requires the contributions of a whole team, things can get ugly pretty fast. In a case like this, TDD (Test Driven Development) can really cut down on the number of bugs by making sure that whatever you introduce to the project doesn't have ripple effects that can cause other key requirements to fail.

During university, one of the programming languages that I was introduced to was Eiffel, which I learnt from a book by one of the Chief Architect of Eiffel Software (and a pioneer of OOP): Bertrand Meyer. It's one of the best Object Oriented languages out there which also comes with a beautiful IDE. And it's here that I first got exposed to TDD. The whole idea of writing tests before we even start building the app sounded pretty weird at first. But it didn't take long for me to discover the benefits of writing code this way.

In the unit test view of Eiffel, the image of a menacing red light was displayed in front of each test case. And as these test cases passed, each of those red lights turned green. And if anything you did later broke one of the earlier test cases, you would be able to see it straight away. The goal was to get the whole thing to light up green. It was quite satisfying to code this way, since it sort of introduced a "gamey" element to the whole process (sort of like leveling up in WOW :) )

Then I got into web based languages. I was disappointed to find that web frameworks in general (and especially my framework of choice, Codeigniter) didn't have very comprehensive test suites.

Having said that, here are a few options for Codeigniter

  • Codeigniter does have an inbuilt unit testing library that you can use to run tests on your models
  • There's a 3rd party unit testing library called TOAST created by Jens Roland
  • There's also an approach suggested by Jamie Rumbelow here (which is probably not for beginners)
Filed under  //   TDD   TOAST   codeigniter   eiffel   testing   unit testing  

Loving the new Mobile jQuery UI

As you know, I've always been a huge fan and supporter of jQuery and jQuery UI. Since I do most of my development in CodeIgniter, jQuery is a really nice hassle free way of dealing with the eye candy effects as well as the AJAX stuff (now made even easier by the jQuery driver in Codeigniter 2!)

The one complaint I've had with jQuery UI was their lack of support for mobile devices. I mean, their UI widgets are still visible on a mobile device, but not very usable (e.g. date picker)

So, I've been using jQtouch. There are a lot of good things in that framework, and it's also built on jQuery. It even taps into the location service of the iphone and provides some other neat tools. But, the problem I had was with the strict way they were enforcing application structure. For an example, by default, the framework expects the whole site to be contained in one page, with various divs that you toggle when menu items are clicked on. As for content, you have to either load everything when the user first loads the site, or have to do a lot of ajax calls. For me, this has always been a big pain to deal with.

Today, the jQuery UI guys announced their plans for Mobile jQuery UI, which looks fantastic! Already being a heavy user of jQuery UI, this is the perfect solution I've been looking for.

While you wait for the official release, you can download the AI file and take a peek at what they've proposed.

Jquery_mobile-widgets

Filed under  //   codeigniter   javascript   jqtouch   jquery   jquery ui   mobile   mobile jquery ui  

Love of algorithms...

As people who know me personally would tell you, I'm a sucker for faster and more efficient algorithms. "Introduction to Algorithms" was hands down my most favourite course in university, and the one that kept me thinking the most after the class was over.

Then, once my university life was over, I got dropped head first into the "real world". What a disappointment that was! Maybe it's because of the path I decided to take: Web application development.

Armed with a web framework (in my case, Codeigniter), most of what I ended up doing day and night was just getting models, views, and controllers to play ball with each other, and do very basic SQL queries to do CRUD operations. What a bore!

Just recently, out of sheer boredom, I decided to resuscitate my passion for algorithms and started browsing around. I was thrilled to find a few algorithms that I can use right off the bat, within my very own domain of Web app development.

If you are working on any projects that require you to handle large amounts of data, I hope you'll find these useful. Every little thing helps :)

Filed under  //   algorithms   binary search   php   quicksort   searching   sorting  

Codeigniter 2.0!

I don't know about you, but I'm pretty psyched about Codeigniter 2.0!

It's choke full of good stuff like "Drivers", "Application packages" and a new Javascript Library Driver which also contains jQuery (YAY!)

They've also dropped Plugins (thank god for that... no more gray area between Libraries and Helpers) and PHP4 support (It's about time!)

You can read more at,
And you can grab yourself a copy from http://bitbucket.org/ellislab/codeigniter/overview (Not fully released yet)
Filed under  //   codeigniter   framework   jquery  

What happened to Facebook Marketplace?

When it first came out, I remember using Facebook Marketplace (http://apps.facebook.com/marketplace) almost on a daily basis to look for deals, jobs etc. Then suddenly, Facebook stopped pushing it. Now it's buried under a pile of apps, and no one barely uses it anymore. (Well... that's at least the case in Toronto, Canada)

I think it's a shame that such a great tool is given such a low priority by Facebook. Having said that, I can see Facebook's point of view as well. 

Facebook has always followed a privacy model where your profile by default is only fully viewable by someone who has mutually befriended you. But the power of Facebook Marketplace lies in the ability to find out a lot of details regarding a seller before you commit to buying something (which its major competitors - Craigslist, Kijiji etc. lack). Seeing an actual human being behind the listing obviously boosts up the chances of someone buying the product being sold.

In other words, a seller who wants to convince buyers to buy what they are selling, would have to crank down the privacy as low as possible so that the buyers (who are random strangers) get to see their profile as much as possible. As you can see, this doesn't exactly go hand in hand with the "friends-only" type privacy model that Facebook users tend to follow.

If a service like Twitter, which follows a completely open privacy model, decides to do more extensive profiles and also sets up their own marketplace, I can clearly see it being a killer product. You would be able to get to know the seller not just by their profile info, but also by the type of tweets they make etc. C'mon Twitter... how about it?
Filed under  //   buy and sell   facebook   marketplace   privacy   twitter