Working with Queries and Scans
Topics
In addition to using primary keys to access items, Amazon DynamoDB also provides two operations
for searching the data: Query and Scan.
Tip
You can query or scan a secondary index in the same way that you query a table. To do this, you
must provide IndexName and TableName as
parameters to Query or Scan. For more
information, see Improving Data Access with Secondary Indexes.
For best practices, see Guidelines for Query and Scan.
Query
A Query operation finds items in a table or a secondary index using only
primary key attribute values. You must provide a partition key name and a
value to search for. You can optionally provide a sort key name and value,
and use a comparison operator to refine the search results. By default, a
Query operation returns all of the data attributes for items
with the specified primary key(s); however, you can use the
ProjectionExpression parameter so that the
Query operation only returns some of the attributes, rather
than all of them.
In a Query operation, you use the
KeyConditionExpression parameter to determine the items to be
read from the table or index. You must specify the partition key name and value as an
equality condition. You can optionally provide a second condition for the sort
key (if present). The sort key condition must use one of the following comparison
operators:
— true if the attributea=bais equal to the valueb— true ifa<bais less thanb— true ifa<=bais less than or equal tob— true ifa>bais greater thanb— true ifa>=bais greater than or equal tob— true ifaBETWEENbANDcais greater than or equal tob, and less than or equal toc.
The following function is also supported:
begins_with (— true if the value of attributea,substr)begins with a particular substring.a
The following are some examples of key condition expressions. Note that these expressions
use placeholders (such as :name and :subj) instead of actual
values. For more information, see Expression Attribute Names and Expression Attribute Values.
Query the Thread table for a particular ForumName (partition key). All of the items with that ForumName value will be read by the query, because the sort key (Subject) is not included in
KeyConditionExpression.ForumName = :nameQuery the Thread table for a particular ForumName (partition key), but this time return only the items with a given Subject (sort key).
Forum = :name and Subject = :subjQuery the Reply table for a particular Id (partition key), but return only those items whose ReplyDateTime (sort key) begins with certain characters.
Id = :id and begins_with(ReplyDateTime, :dt)
You can use any attribute name in a key condition expression, provided that the first
character is a-z or A-Z and the second character (if present)
is a-z, A-Z, or 0-9. In addition, the attribute
name must not be a DynamoDB reserved word. (For a complete list of these, see Reserved Words in DynamoDB.) If an attribute name does
not meet these requirements, you will need to define an expression attribute name as a
placeholder. For more information, see Expression Attribute Names.
For items with a given partition key value, DynamoDB stores these items close
together, in sorted order by sort key value. In a Query operation, DynamoDB
retrieves the items in sorted order, and then processes the items using
KeyConditionExpression and any FilterExpression
that might be present. Only then are the Query results sent back to
the client.
A Query operation always returns a result set. If no matching
items are found, the result set will be empty.
Query results are always sorted by the sort key value. If the data
type of the sort key is Number, the results are returned in numeric order;
otherwise, the results are returned in order of UTF-8 bytes. By
default, the sort order is ascending. To reverse the order, set the
ScanIndexForward parameter to false.
A single Query or Scan operation can
retrieve a maximum of 1 MB of data. This limit applies before any FilterExpression is applied to the results. If LastEvaluatedKey is
present in the response and is non-null, you will need to paginate the result set (see Paginating the Results).
Scan
A Scan operation reads every item in a table or a secondary index. By
default, a Scan operation returns all of the data attributes for
every item in the table or index. You can use the
ProjectionExpression parameter so that
Scan only returns some of the attributes, rather than all of
them.
Scan always returns a result set. If no matching items are
found, the result set will be empty.
A single Scan request can retrieve a maximum of
1 MB of data; DynamoDB can optionally apply a filter expression to this data,
narrowing the results before they are returned to the user. (For more information on
filters, see Filtering the Results from a Query or a Scan.)
Filtering the Results from a Query or a Scan
With a Query or a Scan operation, you can
provide an optional filter expression to refine the results returned to you. A
filter expression lets you apply conditions to the data after
it is queried or scanned, but before it is returned to you. Only the items that meet
your conditions are returned.
The following are some examples of filter expressions. Note that these expressions use
placeholders (such as :num and :name) instead of actual
values. For more information, see Expression Attribute Names and Expression Attribute Values.
Query the Thread table for a particular ForumName (partition key) and Subject (sort key). Of the items that are found, return only the most popular discussion threads—for example, those threads with more than a certain number of Views.
#V > :numNote that Views is a reserved word in DynamoDB (see Reserved Words in DynamoDB), so we use an expression attribute name as a substitution.
Scan the Thread table and return only the items that were last posted to by a particular user.
LastPostedBy = :name
Note
The syntax for FilterExpression is identical to that of
ConditionExpression. In addition, FilterExpression
uses the same comparators, functions, and logical operators as
ConditionExpression.
For Query operations, you cannot define a
FilterExpression based on a partition key or a sort key. (This
does not apply to Sort operations.)
For more information about ConditionExpression syntax, see Condition Expression Reference.
A single Query or Scan operation can
retrieve a maximum of 1 MB of data. This limit applies before any filter
expression is applied to the results.
Capacity Units Consumed by Query and Scan
When you create a table, you specify your read and write capacity unit requirements. If you add a global secondary index to the table, you must also provide the throughput requirements for that index.
You can use Query and Scan operations on
secondary indexes in the same way that you use these operations on a table. If you use
Query or Scan on a local secondary index, then capacity
units are consumed from the table's provisioned throughput. However, if you perform
these operations on a global secondary index, capacity units are consumed from the provisioned throughput
of the index. This is because a global secondary index has its own provisioned throughput settings,
separate from those of its table.
For more information about how DynamoDB computes the capacity units consumed by your operation, see Capacity Unit Calculations.
Note
For Query and Scan operations, DynamoDB
calculates the amount of consumed provisioned throughput based on item size, not on
the amount of data that is returned to an application. For this reason, the number
of capacity units consumed will be the same whether you request all of the
attributes (the default behavior) or just some of them using the
ProjectionExpression parameter.
The number of capacity units consumed will also be the same whether or not you
specify a FilterExpression operation.
Paginating the Results
DynamoDB paginates the results from Query
and Scan operations. With pagination, Query
and Scan results are divided into distinct pieces; an application
can process the first page of results, then the second page, and so on. The data
returned from a Query or Scan operation is
limited to 1 MB; this means that if the result set exceeds
1 MB of data, you'll need to perform another Query or
Scan operation to retrieve the next 1 MB of
data.
If you query or scan for specific attributes that match values that amount to more
than 1 MB of data, you'll need to perform another
Query or Scan request for the next
1 MB of data. To do this, take the LastEvaluatedKey
value from the previous request, and use that value as the
ExclusiveStartKey in the next request. This approach will let
you progressively query or scan for new data in 1 MB increments.
When the entire result set from a Query or
Scan has been processed,
LastEvaluatedKey is null. This
indicates that the result set is complete (that is, the operation processed the “last
page” of data).
If LastEvaluatedKey is anything other than null, this does not necessarily mean that
there is more data in the result set. The only way to know when you have reached the end
of the result set is when LastEvaluatedKey is null.
Limit
The DynamoDB Query and Scan APIs allow a
Limit value to restrict the size of the results.
In a request, set the Limit parameter to the number of items that
you want DynamoDB to process before returning results.
In a response, DynamoDB returns all the matching results within the scope of the
Limit value. For example, if you issue a
Query or a Scan request with a
Limit value of 6 and without a filter
expression, DynamoDB returns the first six items in the table that match the specified key
conditions in the request (or just the first six items in the case of a
Scan with no filter). If you also supply a
FilterExpression value, DynamoDB will return the items in the
first six that also match the filter requirements (the number of results returned will
be less than or equal to 6).
For either a Query or Scan operation,
DynamoDB might return a LastEvaluatedKey value if the operation did
not return all matching items in the table. To get the full count of items that match,
take the LastEvaluatedKey value from the previous request and use
it as the ExclusiveStartKey value in the next request. Repeat
this until DynamoDB no longer returns a LastEvaluatedKey
value.
Counting the Items in a Response
In addition to the items that match your criteria, the response from a
Query or Scan operation contains the
following elements:
ScannedCount— the number of items that were queried or scanned, before any filter expression was applied to the results.Count— the number of items that were returned in the response.
If you do not use a FilterExpression, then ScannedCount
and Count will have the same value.
Suppose that you have a table named TestTable with 1000 items,
and the size of each item is exactly 250 bytes. If you were to Scan this
table (with no filter), then ScannedCount and Count would
have the same value (1000). Now suppose that you added a FilterExpression
so that only 300 items would be returned. DynamoDB would still read all 1000 items from
TestTable, but would discard all of these results except for the 300
items that matched your criteria. In this case, ScannedCount would be 1000,
but Count would be 300.
If the size of the result set exceeds 1 MB, then
ScannedCount and Count will represent
only a partial count of the total items. Suppose that TestTable now
has one million items, and the size of each item is 250 kilobytes. In this case, you
would need to perform multiple Scan requests in order to scan the entire
table. (For more information, see Paginating the Results.) Each response would contain the ScannedCount and
Count for the items processed by that particular
Scan request. To obtain grand totals for all of the Scan
requests, you could to keep a running tally of both ScannedCount
and Count.
Read Consistency
Read Consistency for Query
A Query result is an eventually consistent read, but you can
request a strongly consistent read instead. An eventually consistent read might not
reflect the results of a recently completed PutItem or
UpdateItem operation. For more information, see Read Consistency.
Read Consistency for Scan
When you issue a Scan request, DynamoDB uses eventually
consistent reads. This means that changes to data in the table immediately before
the scan takes place might not be included in the scan results. If you need a
consistent copy of the data, as of the time that the Scan
begins, you can set the ConsistentRead parameter to true.
This will ensure that all of the write operations that completed before the
Scan began will be included in the
Scan response. This is useful in table backup or
replication scenarios, in conjunction with DynamoDB Streams: You first use Scan with
ConsistentRead set to true, in order to obtain a
consistent copy of the data in the table. During the Scan,
DynamoDB Streams records any additional write activity that occurs on the table. After the
Scan completes, you can apply the write activity from the stream
to the table.
Note that a Scan operation with
ConsistentRead set to true will consume twice as many
read capacity units, as compared to leaving ConsistentRead at
its default value (false).
Query and Scan Performance
Generally, a Query operation is more efficient than a
Scan operation.
A Scan operation always scans the entire table or secondary index, then
filters out values to provide the desired result, essentially adding the extra step of
removing data from the result set. Avoid using a Scan operation on
a large table or index with a filter that removes many results, if possible. Also, as a
table or index grows, the Scan operation slows. The
Scan operation examines every item for the requested values,
and can use up the provisioned throughput for a large table or index in a single
operation. For faster response times, design your tables and indexes so that your
applications can use Query instead of Scan.
(For tables, you can also consider using the GetItem and
BatchGetItem APIs.).
Alternatively, design your application to use Scan operations in
a way that minimizes the impact on your request rate. For more information, see Guidelines for Query and Scan.
A Query operation searches for a specific range of keys that
satisfy a given set of key conditions. If you specify a filter expression, then DynamoDB
must perform the extra step of removing data from the result set. A
Query operation seeks the specified composite primary key, or
range of keys, until one of the following events occurs:
The result set is exhausted.
The number of items retrieved reaches the value of the
Limitparameter, if specified.The amount of data retrieved reaches the maximum result set size limit of 1 MB.
Query performance depends on the amount of data retrieved, rather
than the overall number of primary keys in a table or secondary index. The parameters for a
Query operation (and consequently the number of matching keys)
determine the performance of the query. For example, a query on a table that contains a
large set of sort key values for a single partition key value can be more efficient
than a query on another table that has fewer sort key values per partition key
value, if the number of matching keys in the first table is fewer than in the second.
The total number of primary keys, in either table, does not determine the efficiency of
a Query operation. A filter expression can also have an impact on
the efficiency of a Query operation, because the items that don't
match the filter must be removed from the result set. Avoid using a
Query operation on a large table or secondary index with a filter that
removes many results, if possible.
If a specific partition key value has a large set of sort key values, and
the results cannot be retrieved in a single Query request, the
ExclusiveStartKey continuation parameter allows you to submit
a new query request from the last retrieved item without reprocessing the data already
retrieved.
Parallel Scan
By default, the Scan operation processes data sequentially. DynamoDB
returns data to the application in 1 MB increments, and an application
performs additional Scan operations to retrieve the next
1 MB of data.
The larger the table or index being scanned, the more time the Scan will take to
complete. In addition, a sequential Scan might not always be able to
fully utilize the provisioned read throughput capacity: Even though DynamoDB
distributes a large table's data across multiple physical partitions, a
Scan operation can only read one partition at a time. For this
reason, the throughput of a Scan is constrained by the maximum
throughput of a single partition.
To address these issues, the Scan operation can logically divide a
table or secondary index into multiple segments, with multiple application workers
scanning the segments in parallel. Each worker can be a thread (in programming languages
that support multithreading) or an operating system process. To perform a parallel scan,
each worker issues its own Scan request with the following
parameters:
Segment —A segment to be scanned by a particular worker. Each worker should use a different value forSegment.TotalSegments —The total number of segments for the parallel scan. This value must be the same as the number of workers that your application will use.
The following diagram shows how a multithreaded application performs a parallel Scan with three degrees of parallelism:

In this diagram, the application spawns three threads and assigns each thread a
number. (Segments are zero-based, so the first number is always 0.) Each thread issues a
Scan request, setting Segment to its designated number
and setting TotalSegments to 3. Each thread scans its designated segment,
retrieving data 1 MB at a time, and returns the data to the application's main
thread.
The values for Segment and TotalSegments apply to individual
Scan requests, and you can use different values at any time. You
might need to experiment with these values, and the number of workers you use, until your
application achieves its best performance.
Note
A parallel scan with a large number of workers can easily consume all of the provisioned throughput for the table or index being scanned. It is best to avoid such scans if the table or index is also incurring heavy read or write activity from other applications.
To control the amount of data returned per request, use the Limit
parameter. This can help prevent situations where one worker consumes all of the
provisioned throughput, at the expense of all other workers. For more information, see
"Reduce Page Size" in Avoid Sudden Bursts of Read Activity.

