8

One of the first things new dotCMS users run into is having to search some content.  By now I am assuming that you are familiar with how to create new structures in dotCMS and how to use the pullContent macro to pull the contentlets in your structures.  If you are not, I suggest you head over the the dotCMS documentation site and familiarize yourself with both.   In this brief tutorial I hope to show you how to take that pullContent macro to the next level.

What this tutorial is not about

I am specifically covering the pullContent macro in this tutorial.  The pullContent macro can only be used to search for contentlets, not pages.  There is some documentation available on the dotCMS documentation site if you are looking to set up spindle or implement a Google-based site search.  I am also not going to cover pulling content directly from the dotCMS database.  I do realize that there are times when you will need to hit the database for content, but I would strongly advise against doing so.  Come v1.9, this should no longer be an issue, but more about that later.

A little bit of setup

The first thing we need is a structure.  You can use any structure you already have, or create a new one.  Your choice.  The structure I am going to use is called “Clery Log Entry”.  This is a structure I created for the police department to log their Clery stats.  It really is not important what that is, but here is the layout of my structure:

Label Variable Name Index Name
ICR Number icrNumber text1
Offense offense text2
Clery Stats clearyStats text3
Location location text4
Reported reported date1
Occurred occurred date2
Description description text_area1
Disposition disposition text_area2

If you’ve just created this structure you’ll need to make sure you stick some test data into it so you can see the results as you build the code.

Building the Code

Let’s Jump into it!  To start off let’s setup a simple pullContent for this structure.  The first thing you will need is the structureInode.  I assume you know how to get that by now, but just in case:  Start by going to the Search Tab.  Then select the structure you are using and click the search button.  At the bottom of the page (top-right if you are using 1.9) there is a show query link.  Click on that and you should get something like:

Query: +structureInode:636931

There you have it.  Now, let’s build the simple pullContent:

## 1. Set up Lucene Query
#set($query = "+structureInode:636931 +live:true")
#set($orderBy = "text1 desc")
#set($limit = "30")

## 2. Pull The Content
#pullContent("$query" "$limit" "$orderBy")

## 3. Show the Content
#if($list.size() > 0)
  #foreach($content in $list)
    <div class="entry">#editContentlet($content.inode)
      <h3>$!{content.icrNumber}
      <strong>$!content.offense</strong></h3>
      <span class="stats_location">$!content.clearyStats - $!content.location</span>
      <span class="reported_occoured">Reported: $!content.reported - Occurred: $!content.occurred</span>
      #if($UtilMethods.isSet($content.description))
        <h4>Description</h4>
        $!{content.description}
      #end
      #if($UtilMethods.isSet($content.disposition))
        <h4>Disposition</h4>
        $!{content.disposition}
      #end
    </div>
  #end
#else
  Found 0 Results.
#end

Like I said, you should be fairly familiar with doing simple pulls like this. If not, then jump over to the doc site and freshen up on that first. Now lets talk about searching. The first thing we are going to need if we are going to have a search is a search form:

## Search Form
<form method="get" action="$VTLSERVLET_URI">
  <label for="basic_search">Search: </label>

  #if($UtilMethods.isSet($request.getParameter('basic_search')))
    <a href="$VTLSERVLET_URI">Clear Your Search</a>
  #end
</form>

Simple enough, but let’s talk about this just a bit. First we have the form. As you can see I am using get. One issue you will find with dotCMS is that if you use post for your forms, they will not work in the back-end while editing them. What I usually do is start with get and then when I am done with my testing I switch to post. The next thing I am using the $VTLSERVLET_URI variable. This is the nice way of getting the current URL. This allows your code to more portable.

In the search field I have set the value to $!{request.getParameter('basic_search')}. This line of code gets the search term from the request, much like PHP’s $_REQUEST['variable']. Let’s break this apart. First you will see that I am using the $!{variable} variable form, In velocity, if you use $variable or ${variable} and the variable is empty, then the variable name will print to the page. If you use $!variable or $!{variable} it will not print out the variable name if the variable is empty. Next is $request.getParameter('variable'). This little bit is how you get post or get data from the request.

Next is a little bit of code to detect if the form has been submitted. #if($UtilMethods.isSet($request.getParameter('basic_search'))) .

Now what we need to do is modify the lucene search query to add in the search term. I want to search across ICR Number, Offense, Clery Stats, and Location. To search for a field in a Lucene Query you have to have 2 things. It must be indexed, and you need the index or database name.

## Look for a search term
#set($search = "")
#if($UtilMethods.isSet($request.getParameter('basic_search')))
  #set($st = $request.getParameter('basic_search'))
  #set($search = "+(text1:${st}* text2:${st}* text3:${st}* text4:${st}*)")
#end

## Pull The Content
#pullContent("$query $!{search}" "$limit" "$orderBy")

First we need declare a variable to hold the search: #set($search = ""). Though it is not exactly necessary to declare your variables in velocity, I like to do so anyway. Next we check to see if the form has been submitted again. After that we set up the search. #set($search = "+(text1:$!{st}* text2:$!{st}* text3:$!{st}* text4:$!{st}*)"). For this part it is necessary to know about about how to generate a lucene query. There is a great resource here for learning the ins and outs of Lucene. +() says AND everything in the parenthesis. text1:$!{st}* says search the text1 field for anything starting with $st and so on for each field. Since I didn’t put a plus sign in front of each term, it will be an OR search. Here is a little cheat sheet to help you decipher the Lucene Syntax:

+(text1:foo text2:bar)
Is equivalent to
IF(text1 == foo OR text2 == bar)

+text1:foo +text2:bar
Is equivalent to
IF(text1 == foo AND text2 == bar)

+text1:foo* The “*” after “foo” is a wildcard. In the current version of dotCMS (v1.7) you cannot put a wildcard at the start of a term. In other words you cannot do: +text1:*foo*. This shortcoming is fixed in v1.9, but for us v1.7 folks, there is not much we can do to get around it for now. I’ll talk a little more about dotCMS’s Lucene Indexing later in this post.

The last line of the above code adds the search we just generated to the query in the pullContent. I used the $!{} syntax again, because if there is nothing in the search, it will not put the variable name into the query.

And that is all there is to it. We can get pretty advanced searches off with just this simple code. You can also incorporate this same code with the pageContent macro and get paginated results. Here is the final code:

## Set up Lucene Query
#set($query = "+structureInode:636931 +live:true")
#set($orderBy = "text1 desc")
#set($limit = '30')

## Search Box For the user to search for something:
<form method="get" action="$VTLSERVLET_URI">
  <label for="basic_search">Search: </label>

  #if($UtilMethods.isSet($request.getParameter('basic_search')))
    <a href="$VTLSERVLET_URI">Clear Your Search</a>
  #end
</form>

## Look for a search term
#set($search = "")
#if($UtilMethods.isSet($request.getParameter('basic_search')))
  #set($st = $request.getParameter('basic_search'))
  #set($search = "+(text1:$!{st}* text2:$!{st}* text3:$!{st}* text4:$!{st}*)")
#end

##Pull The Content
#pullContent("$query $!{search}" "$limit" "$orderBy")

## Loop through Content
#if($list.size() > 0)
  #foreach($content in $list)
   <div class="entry">
    #editContentlet($content.inode)
     <h3>
     $!{content.icrNumber}
      <b>$!content.offense</b>
     </h3>
    <p>$!content.clearyStats - $!content.location</p>
    <p>Reported: $!content.reported - Occurred: $!content.occurred</p>
    #if($UtilMethods.isSet($content.description))
      <h4>Description</h4>
      <p> $!{content.description} </p>
    #end
    #if($UtilMethods.isSet($content.disposition))
      <h4>Disposition</h4>
      <p> $!{content.disposition} </p>
    #end
   </div>
  #end
#else
  <p> Your Search Found 0 Results.  Please Try Again.<br /><a href="$VTLSERVLET_URI">Clear your Search</a></p>
#end

Advanced Topics

Taking this concept a bit further you can search on categories and tags. Both are a bit tricky if you don’t know the tricks. Let’s start with Categories.

Category Searches

The first thing you need for a category search is a category listing. Let’s add a category field to our structure. If you are not sure how to create a category and add a category field to a structure, consult the dotCMS documentation site. I created a Category called Audience. The children of this category are: Student, Faculty, Staff, Visitor. You will need to make sure you give your category a unique key for this. If you click the category name to edit it, you will see the unique key field. I entered “audience”. Then I created a Category Field on my structure Called “Audience”. Here is the code for a category listing:

  <ul id="categories">
    <li class="heading">Categories:</li>
    #set($cats = $categories.getChildrenCategoriesByKey('audience'))
    #foreach($audience in $cats)
      <li #if($UtilMethods.isSet($request.getParameter('c')) && $request.getParameter('c') == $audience.getInode()) class="selected" #end>
      	<a href="${VTLSERVLET_URI}?c=${audience.getInode()}">${audience.getCategoryName()}</a></li>
    #end
  </ul>

A couple things going on here that you might not be familiar with. First, to get the categories we use the CategoryWebAPI ViewTool. That viewtool has a method called getChildrenCategoriesByKey. That method returns an array of all the children of category with the unique key that you specify. We then loop through the array and build links to each category.
#if($UtilMethods.isSet($request.getParameter('c')) && $request.getParameter('c') == $audience.getInode()) class="selected" #end
That little bit says, if we have a category in the request, and it happens to be the current category, add the selected class to the list item.
${VTLSERVLET_URI}?c=${audience.getInode()}
That is how we link to the current page and add the category inode to the request.
${audience.getCategoryName()}
This last part is how you get the Category’s name from the Category Object.

Now that we have a listing of all the categories, we need to look to see if there is a category in the request and build the lucene query if there is.

#if($UtilMethods.isSet($request.getParameter('c')))
  #set($search = "+c${request.getParameter('c')}c:on")
#end

Simple enough? It may look a little odd, but to search for all contentlets that have a particular category selected is by the following syntax: +cc:on. Now you know, and knowing is half the battle!

Tag Searching

Tag searches are bit more complicated. First off, let’s add a tag field to our structure and call the field Tags. Then let’s add a tag cloud to our page

  <div id="tag_cloud">
    #tagCloud("Clery Log Entry" "${VTLSERVLET_URI}" 40)
  </div>

That couldn’t be more easy right? The first parameter to the tagCloud macro is the name(s) of the structure(s) you want to show the tag cloud for. The next parameter is the URL of the page you want the tag cloud to point to. The last parameter is a limit of how many tags you want on your page. Take a moment to see how the URLs for the tag cloud are formed, you will notice that it uses the request variable “tag”. Hold on tight now, this is how you construct your tag search:

#if($UtilMethods.isSet($request.getParameter('tag')))
    #set($tagNameStr = $request.getParameter('tag'))
    #set($tagName    = $tagNameStr.replace(' ','* AND text_area3:'))
    #set($search     = "+(+text_area3:${tagName}*)")
#end

While the code is not that hard, the why we have to do it this way is a little harder to explain. To understand the why, you have to know that dotCMS uses a Token Based index. In other words, each word of a field is indexed as a separate token. So when you want to search for a tag like “news release”, it is actually indexed in dotCMS as 2 separate tokens. So what we need to search for is a record that contains both the token “news” AND the token “release”: +text_area3:news* +text_area3:release* (+ and AND can actually be used interchangeably in a Lucene Query).

Wrapping Up

Lucene can be a bit difficult to grasp at first, but I hope my post helps to ease your fears about it. Lucene definitely has some shortcomings. First, I have already mentioned that you can’t use a wildcard as the first character of your search. This limits you to doing “Contains a Word (Token) that Starts With” type searching. You will not be able to do a full text search. However, in v1.9 the guys at dotCMS have gone a long way to mitigating these kinds of shortcomings. They tell me that pretty soon we will even be able to generate more SQL-like queries that will be translated to Lucene for us. That would certainly go a long way to easing the entry barrier for working with dotCMS content.


Photo credit: AttributionNoncommercialShare Alike Some rights reserved by Stéfan


8 Responses to “Searching Content in dotCMS”

  1. Stephen Bell says:

    A few code questions (possible typos?):

    1. Did you mean to use “>” in line 10 of your first code snippet? Is there some reason a regular “>” wouldn’t work instead of the HTML special character?
    2. You said:

    “The last line of the above code adds the search we just generated to the query in the pullContent. I used the $!{} syntax again, because if there is nothing in the search, it will not put the variable name into the query.”

    But it looks like you missed the exclamation point (line 9):
    #pullContent(“$query ${search}” “$limit” “$orderBy”)

  2. Chris Falzone says:

    Stehpen,

    Thanks for spotting my typos. You are right on both accounts. it should be “>” not the HTML special character and I did miss the “!” in my code examples. Good Eye! I got the code examples fixed up now.

  3. Henry says:

    Chris,

    This tutorial was quite helpful in explaining a few things. I’m going to book mark it for future reference. Thanks for the help.

    Henry

  4. nathan says:

    Am I missing something? I don’t see any tags in your search form.

  5. milan says:

    Hi Chris tnx for nice and very useful article.
    How to search multiple structure at the same time?

  6. Michael Fienen says:

    Milan, just use an OR statement in the query where you have the structure listed, followed by the rest of your query. It’d look something like
    +(structureName:yourFirstStructure structureName:yourSecondStructure)

  7. milan says:

    Tnx Michael. It’s working 🙂

  8. milan says:

    Michael, how to make a search filter in dotcms 1.7? Is there any doc?

Leave a Reply