Google Developer Relations
Introduction
The previous class covered the basics of defining, submitting, and processing a search query. The Search API supports more complex queries, including specification of the point in the index at which query should start, how the results should be sorted and formatted, and what information about the docs should be returned from the query. It also supports Geosearch (location-based queries).
In this lesson, we'll look in more detail at some of these features. You'll learn the following concepts:
- Define and process the results of complex queries
- Control which document fields are returned from a query
- Use offsets and limits to control where a query starts and how many results are returned
- Construct and use location-based queries (Geosearch)
See the Search API documentation for more detail on the features described in this lesson, as well as some additional capabilities that we won't cover here.
Objectives
Learn how to perform complex Search API search queries
Prerequisites
The precursor to this class, Getting Started with the Python Search API
You should also:
- Python 2.7 and the Google App Engine SDK for Python
- Familiarity with Python and the basics of App Engine applications
Query options
The constructor for class
Query accepts an
optional QueryOptions object
as an argument, allowing you to configure a wide range of options:
search_query = search.Query(
query_string=query.strip(),
options=search.QueryOptions(...)
)
Consider one of the QueryOptions configurations used in the example product
search application:
search_query = search.Query(
query_string=query.strip(),
options=search.QueryOptions(
limit=doc_limit,
offset=offsetval,
sort_options=sortopts,
snippeted_fields=[docs.Product.DESCRIPTION],
returned_expressions=[search.FieldExpression(name='adjusted_price',
expression='max(price, 14.99)')],
returned_fields = [docs.Product.PID, docs.Product.DESCRIPTION,
docs.Product.CATEGORY, docs.Product.AVG_RATING,
docs.Product.PRICE, docs.Product.PRODUCT_NAME]
))
This specifies an offset (where to start the query) and a limit (the maximum number of results to return), some sort options (discussed in the next lesson), a list of snippeted fields, a list of returned expressions (computed fields), and a list of returned fields. Let's look at what each of these options does.
Query offsets, limits, and cursors
To control the number of results a query returns, use the QueryOptions
constructor's
limit parameter. The example product search application uses limit to return
a maximum of three results per page.
The example above also shows the use of the offset parameter. The offset
specifies the number of matched documents to skip before beginning to return
results:
search.QueryOptions(
limit=doc_limit,
offset=offsetval,
...)
One common use for the offset and limit parameters is to paginate the query
results. To implement pagination, you need to know the total number of matches
the query found and how many have been returned so far. You can get that
information from the returned
SearchResults object:
number_found = search_results.number_found
returned_count = len(search_results.results)
The Search API also supports the use of query cursors. Cursors are another way to indicate the point from which to begin a query, allowing you to continue a search from the end of the previous result set. Using a cursor is generally more efficient than using offsets. However, the Search API doesn't currently support a "reverse cursor" as does the Datastore API, making it more difficult to to implement backward paging. For this reason, the example application uses offsets rather than cursors to paginate its query results. You can find an example using cursors here.
Snippeting
Snippeted fields allow you to return an abbreviated portion of a field instead
of its full content. The returned snippet will include the fragment of the field
on which the match occurred, with the matched search terms highlighted in bold.
In the product search application (with default data), a search on the query
stories returns three matches, in the documents' description fields. Because
we requested that description be snippeted, the snippet expressions in the
results have the word "stories" highlighted.
You specify the snippeting that should occur by providing an iterable of field
names to snippet. The QueryOptions constructor above requests snippeting of
the DESCRIPTION field:
search.QueryOptions(
snippeted_fields=[docs.Product.DESCRIPTION],
...)
Then, when processing your query results, you access the generated snippets via
a returned document's expressions property:
for doc in search_results:
...
for expr in doc.expressions: # iterate over the computed fields
if expr.name == docs.Product.DESCRIPTION:
description_snippet = expr.value
break
# ... do something with the document ...
The expressions property holds a list of computed fields that are the
results of expressions requested in the query. The code above grabs the snippet
generated for the DESCRIPTION field, where doc is a scored
document. Scored documents
are returned from a search. In addition to document content, they include the
document score, as well as computed fields (discussed below) and other
information.
Returned expressions and expression functions
The returned_expression query option allows you to define computed fields,
based on your document fields, that will be returned as part of a scored
document in the search results.
Suppose you want to compute and display a price for each product that includes
an 8% sales tax. You create a field
expression with the name
adjusted_price, whose value is the string price * 1.08:
search.QueryOptions(
returned_expressions=[search.FieldExpression(name='adjusted_price',
expression='price * 1.08')],
...)
This expression tells the search API to return, as the value of
adjusted_price, the value of the price field multiplied by 1.08. The Search
API provides a variety of built-in expression
functions
that you can use in such expressions. For example, you can define expressions
like 'max(price, 9.99)'.
After including a returned_expression list in your QueryOptions object, you
can access that computed field in the documents returned from the search query,
again via the expressions property:
for doc in search_results:
...
for expr in doc.expressions: # iterate over the computed fields
if expr.name == docs.Product.DESCRIPTION: # get the description snippet
description_snippet = expr.value
elif expr.name == 'adjusted_price': # get the adjusted price
price = expr.value
# ... do something with the document ...
Returned fields
The QueryOptions constructor also accepts a returned_fields parameter, which
you can use to make your queries more efficient by requesting only the specific
document fields you intend to use. For example, the QueryOptions object shown
earlier requests all the "core" product fields except for the date last update,
which we've decided not to show in our result summary. It also doesn't request
any of the category-specific fields, such as publisher for book documents or
tv_type for hd_television documents:
search.QueryOptions(
returned_fields = [docs.Product.PID, docs.Product.DESCRIPTION,
docs.Product.CATEGORY, docs.Product.AVG_RATING,
docs.Product.PRICE, docs.Product.PRODUCT_NAME]
...)
The returned_fields argument should be an iterable over the names of fields to
return in search results. The documents returned in the search results will
include only the specified fields, even though the indexed documents can include
other fields.
Location-based queries (Geosearch)
The Search API's support for Geosearch allows you to make location-based queries. These allow you, for example, to find nearby stores or restaurants, or nearby activity stream updates.
To execute a location-based query, you need three pieces of information:
- A location, in latitude and longitude coordinates, from which to measure distances.
- The radius within which to search (such as 45 kilometers).
- The set of points to which to measure distances.
The first two of these items are often supplied by the user. The last comes from the indexed documents themselves: in our example product search application, it consists of the locations of our stores, taken from the store location documents we built in the previous Getting Started class.
To search for store locations near the user, the example application obtains the user's location via the browser, and the user inputs the distance within which to search. The distance is converted to meters, the unit of distance used by the Search API. Suppose the user's location is (-33.857, 151.215), and they specify a search radius of 45 kilometers. The application would construct a query string like
"distance(store_location, geopoint(-33.857, 151.215)) < 45000"
and pass it to the Index.search method:
from google.appengine.api import search
...
# a query string like this comes from the client
query = "distance(store_location, geopoint(-33.857, 151.215)) < 45000"
try:
index = search.Index(config.STORE_INDEX_NAME)
search_results = index.search(query)
for doc in search_results:
# process doc ...
except search.Error:
# ...
Summary and review
In this lesson, we've learned how to specify a search query using a
QueryOptions object, and we've looked at some useful QueryOptions
properties: limit and offset, snippeted_fields, returned_expression, and
returned_fields. We've also described how to construct a Geosearch query.
One important QueryOptions property, sort_options, has enough features to
merit its own lesson, so we'll discuss it next. See the
QueryOptions documentation
for additional options not covered in this lesson.
To check your understanding, try playing with some of the QueryOptions
properties described here. For instance, change the DOC_LIMIT in the
config.py file to a larger value. This is the value passed as the
QueryOptions limit argument.
Try playing with the returned_expressions feature. returned_expressions
should have been defined in _buildQuery() like this:
search.FieldExpression(name='adjusted_price',
expression='price * 1.08')
Look for the lines in handlers.py, in class ProductSearchHandler, that say
# uncomment to use 'adjusted price', which should be
# defined in returned_expressions in _buildQuery() below, as the
# displayed price.
Uncomment the lines below them:
# elif expr.name == 'adjusted_price':
# price = expr.value
When you redeploy the application, you should see the adjusted_price displayed
in the search results instead of the actual price. That is, the price displayed
will include the sales tax. The View product details link in the search
results will still show you the actual price. (The adjusted_price field will
be populated only for a deployed application).
In the next lesson, you'll learn how to sort the results of a query search in the order you want them.