Many web applications suffer from cross-site scripting (XSS) vulnerabilities. XSS was number 3 in OWASP's top 10 application security risks in 2013. Closure Templates has features to prevent XSS in your application.
Autoescaping in Closure Templates
XSS vulnerabilities typically occur when dynamic text from an untrusted source is embedded into
an HTML document. To prevent these vulnerabilities, escaping is used. Escaping is the
process of converting text to be properly displayed in its context, such as turning angle
brackets into < and > so they are not interpreted as
tags.
The type of escaping needed depends on the context in the document where the value appears. For
example, a value that appears inside a <style> tag needs to be escaped
differently than a value that appears in a URI.
Closure Templates' autoescaping ensures that every dynamic value is escaped in a context-appropriate way.
Strict Autoescaping
The most secure way to use Closure Templates is with strict autoescaping. Strict
templates are recursively guaranteed not to underescape the output. Every last dynamic value
is printed with the correct escaping technique.
The output of a strict template is not a plain string, but a
SanitizedContent object, which associates a content kind with the
text. The content kind represents how the content is intended to be used, and the type of
escaping, if any, that has already been applied to it. This information is particularly
important in cases where the output of one template is used as an input parameter to another
template.
For every dynamic value that appears in the output of a template, Closure Templates identifies the output context at the point of use, determined by the surrounding text.
These two factors (content kind and output context) determine what kind of escaping is applied to the text. For example, if the text has already been URI-escaped, and it's being used in a URI context, then there's no need to escape it again. This prevents "double-escaping" of the text.
Content Kinds
The different content kinds are:
| Content kind | Description | Example | Notes |
|---|---|---|---|
html |
HTML markup | <div>Hello!</div> |
|
attributes |
HTML attribute-value pairs | class="foo" width="100%" |
Represents the combination of both attribute names and attribute values, and must include
the quotation marks around the attribute value. If the template output is intended to be
just an attribute value alone (the part inside the quotes) then use either the
text or html content kind.
|
text |
Plain text, not yet escaped | Hello! |
|
uri |
URIs | http://www.google.com/search?q=android |
|
css |
Stylesheet text | .myClass{ color: red; display: block; } |
|
js |
JavaScript or JSON | {"a": 1, "b": 2} |
|
trusted_resource_uri |
A URL which is under application control and from which script, CSS, and other resources that represent executable code can be fetched. | https://www.google.com/test.js |
Currently Soy requires trusted_resource_uri for script srcs. In the future, this may apply to other kinds of resources, such as stylesheets. |
The content kind isn't a compiler type; you won't get an error or warning if you use
a text kind in a css context or vice versa. Rather, the content kind
is an indication that the text is safe for a given context and therefore does not need
additional escaping.
For input values that are not SanitizedContent objects, a strict template coerces
the value to a text string, and then applies escaping based on the context.
Usage
Strict autoescaping is on by default for all templates. (You can also explicitly declare it
by adding autoescape="strict" to your
namespace or
template declarations.)
By default, the output of a strict template has kind html. If your template
produces a different kind of content, you must add kind attributes to your
template. For example, a strict template that produces a URI might look like this:
{template .googleUri autoescape="strict" kind="uri"}
http://www.google.com/
{/template}
The kind attribute can be added to the following Closure Templates commands:
| Command | Notes |
|---|---|
template |
Optional. Assumed to be kind="html" if omitted. |
deltemplate |
Optional. Assumed to be kind="html" if omitted.
All matching delegates must have the same kind. |
let |
Required only for block form let statements. |
param |
Required only for block form param statements. |
The following example illustrates the usage of the kind attribute:
{template .foo autoescape="strict" kind="html"}
// Block-form 'let' command, 'kind' is required.
{let $message kind="text"}
{msg}Hi, {$name}!{/msg}
{/let}
// Short form 'let', no 'kind' attribute.
{let $category: $categoryList[0] /}
{call .bar}
// Block-form 'param' command, kind is required.
{param attributes kind="attributes"}
title="{$message}"{sp}
onclick="foo('{$message}')"
{/param}
{param content kind="html"}
<b>{$message}</b>
{/param}
// Short-form 'param' command, no 'kind' attribute.
{param visible: true /}
{/call}
{/template}
Short-form commands don't need the kind attribute because they pass values
rather than constructing strings, and values keep whatever kind they already have.
Strict autoescaping can be turned on for some templates and not for others, so you do not need to change all your templates at once. However, it is a good idea to eventually make all your templates use strict autoescaping.
Passing parameters to strict templates
For ordinary content that doesn't contain markup, you can just pass in the string values as template parameters as before, and they will get escaped.
For trusted content that has markup that you don't want re-escaped, wrap the content
in the appropriate SanitizedContent object. It is your responsibility to make sure
that the content is really safe; otherwise you will introduce exactly the kind of XSS
vulnerability that strict mode was designed to prevent.
For JavaScript, the function to wrap an HTML content string is
soydata.VERY_UNSAFE.ordainSanitizedHtml. In Java, the equivalent function
is com.google.template.soy.data.UnsafeSanitizedContentOrdainer#ordainAsSafe.
You might want to place restrictions in your project that limit where and when these wrapper functions can be called, such as limiting these calls to a specific class or package that can easily be searched and audited. Otherwise, it becomes tempting to simply wrap arbitrary, untrusted strings whenever it's convenient in the code, which defeats the whole purpose of strict autoescaping.
Anatomy of an XSS Hack (and its prevention)
Template systems make it easy to compose content from static HTML and dynamic values. Closure Templates's autoescaping makes it even easier by letting you use the same values in many contexts without having to explicitly specify encoding.
An enterprising hacker might try to sneak a malicious value into your template to take it over via XSS. Perhaps using
{ x: 'javascript:/*</style></script>/**/
/<script>1/(alert(1337))//</script>' }
If we pass this to a naive template, like
{template .foo autoescape="deprecated-contextual"}
<a href="{$x|noAutoescape}"
onclick="{$x|noAutoescape}"
>{$x|noAutoescape}</a>
<script>var x = '{$x|noAutoescape}'</script>
<style>
p {
font-family: "{$x|noAutoescape}";
background: url(/images?q={$x|noAutoescape});
left: {$x|noAutoescape}
}
</style>
{/template}
then the attack succeeds. That template produces
<a href="javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>"
onclick="javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>"
>javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script></a>
<script>var x = 'javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script></script>
<style>
p {
font-family: "javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>";
background: url(/images?q=javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>);
left: javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>
}
</style>
which pops up "1337" 6 times, and a seventh if you click the link.
Let's take another look at that malicious input to figure out why:
javascript: | At the beginning of a URL, this changes the rest of the content into JavaScript. In a script statement, this is just an unused label. |
/*</style></script>/**/ | This breaks out of
any style or script element. If already in a script attribute value,
this just looks like a comment. It prematurely ends any unquoted attribute value and its
containing tag. |
/<script>1/ | If outside a script, this starts a script tag with a useless division. Inside a script, this is a self-contained regular expression literal. |
(alert(1337)) | If preceded by a regular expression literal, this tries to
call it, but only after executing the real malicious code, alert(1337). |
//</script> | If inside a script tag, this closes it correctly. If inside a javascript: URL attribute or event handler attribute, this is a harmless comment. |
Many of the pieces of that malicious input depend on being interpreted different ways by different parts of a browser. Autoescaping defangs this and other malicious inputs by choosing a single consistent meaning for a dynamic value, and choosing an escaping scheme that makes sure the browser will interpret it the same way.
So if we pass that same malicious input to an autoescaped template: (Note that only
the |noAutoescape's have been removed.)
{template .foo autoescape="deprecated-contextual"} <a href="{$x|noAutoescape}" onclick="{$x|noAutoescape}" >{$x|noAutoescape}</a> <script>var x = '{$x|noAutoescape}'</script> <style> p { font-family: "{$x|noAutoescape}"; background: url(/images?q={$x|noAutoescape}); left: {$x|noAutoescape} } </style> {/template}
We get a very different output; one that is altogether saner:
<a href="#zSoyz"
onclick="'javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>'"
>javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script></a>
<script>var x = 'javascript:/*\x3c/style\x3e\x3c/script\x3e/**/ /\x3cscript\x3e1/(alert(1337))//\x3c/script\x3e'</script>
<style>
p {
font-family: "javascript:/*\3c /style\3e \3c /script\3e /**/ /\3c script\3e 1/(alert(1337))//\3c /script\3e ";
background: url(/images?q=javascript%3A%2F%2A%3E%2Fstyle%3C%3E%2Fscript%3C%2F%2A%2A%2F%20%2F%3Escript%3C1%2F%28alert%281337%29%29%2F%2F%3E%2Fscript%3C);
left: zSoyz
}
</style>
- When
{$x}appeared inside HTML text, we entity-encoded it (< → <). - When
{$x}appeared inside a URL or as a CSS quantity, we rejected it because it had a protocoljavascript:that was nothttporhttps, and instead output a safe value#zSoyz. Had{$x}appeared in the query portion of a URL, we would have percent-encoded it instead of rejecting it outright (< → %3C). - When
{$x}appeared in JavaScript, we wrapped it in quotes (if not already inside quotes) and escaped HTML special characters (< → \x3c). - When
{$x}appeared inside CSS quotes, we did something similar to JavaScript, but using CSS escaping conventions (< → \3c ).
The malicious output was defanged.
Escaping: the fine details
Substitutions in HTML
When a print command appears where normal HTML text could appear, then the result is HTML entity-escaped. For example, in
<div title="{$shortMessage}">{$longMessage}</div>
given ({
"shortMessage": "I <3 ponies",
"longMessage": "OMG! <3 <3 <3!"
}) produces
<div title="I <3 ponies!">OMG! <3 <3 <3!</div>
You can safely substitute data anywhere a tag can appear or in a plain attribute value. It's good
practice to quote all your attributes, but if you do forget quotes, the autoescaper makes sure the
attribute value cannot be split by spaces in the dynamic value. Given the input above,
<div title={$shortMessage}> →
<div title=I <3 ponies!>.
Spaces, which would normally end an unquoted attribute value, are encoded to keep the value
together.
To avoid over-escaping of known safe HTML, you can use sanitized content. The
template <div>{$foo}</div> given
{ foo: new soydata.SanitizedHtml("<b>Foo</b>") }
produces output that is not re-escaped:
<div><b>Foo</b></div>, instead of the
over-escaped version that would have been produced if the
soydata.SanitizedHtml wrapper were not there:
<div><b>Foo</b></div>.
Sanitized content is safe to use with attributes and with elements that cannot contain tags such
as TEXTAREA. The template <div
title="{$foo}">{$foo}</div> given the input above produces a sensible output:
<div title="Foo"><b>Foo</b></div>. When
embedded in an HTML attribute, sanitized content will have tags stripped first.
Substitutions in Tag and Attribute Names
Substitutions in tag and attribute names are sanity-checked rather than entity-encoded.
<h{$headerLevel}>Foo</h{$headerLevel>
→ <h3>Foo</h3> for headerLevel=3 but
for headerLevel='><script>alert(1337)<script' you
get <hzSoyz>Foo</hzSoyz>. You'll also get a log
message in Java, and in JavaScript, if you're running with
closure
asserts enabled, you get an assert.
Don't try to specify special tag names; like script or style; or special
attribute names; like href, style, or onclick; dynamically.
Trying to use
<{$name}>{$content}</{$name}>
with ({ "name": "script", "content": "alert(1337)" })
or <a {$name}="{$content}">
with ({ "name": "onmouseover", "content": "alert(1337)" }) is
asking for trouble. Since the autoescaper cannot distinguish JavaScript, CSS, or URLs from plain
HTML with those tag and attribute names, it must reject them.
Substitutions in URLs
Values that are substituted into different parts of URIs are treated differently. Substitutions in the query part are URI-escaped.
<a href="{$x}"> |
Entity-escape and filter out bad protocols. | ||
({ "x": "http://foo/" }) |
→ | <a href="http://foo/"> |
|
({ "x": "/foo?a=b&c=d" }) |
→ | <a href="/foo?a=b&c=d"> |
|
({ "x": "javascript:alert(1337)" }) |
→ | <a href="#zSoyz"> |
|
<a href="/foo/{$x}"> |
Just entity-escape. | ||
({ "x": "bar" }) |
→ | <a href="/foo/bar"> |
|
({ "x": "bar&baz/boo" }) |
→ | <a href="/foo/bar&baz/boo"> |
|
<a href="/foo?q={$x}"> |
Percent encode inside query. | ||
({ "x": "bar&baz=boo" }) |
→ | <a href="/foo?q=bar%26baz%3dboo"> |
|
({ "x": soydata.VERY_UNSAFE.
|
→ | <a href="/foo?q=bar&baz=boo"> |
|
({ "x": "A is #1" }) |
→ | <a href="/foo?q=A%20is%20%231"> |
|
As long as you stick to
standard HTML attribute names, the
autoescaper figures out which attributes contain URLs, which contain CSS, etc. If you do decide
to define custom attributes such as
data-…
attributes, you can still use a naming convention to tell the autoescaper which attributes have
URL content: Names that start or end with "URL" or "URI", ignoring case, will be treated as having
URL values. For example, the autoescaper treats data-secondaryUrl,
foo:urlForLogin, and data-thesauri as having URL content;
but not data-curliewurly.
Precisely, /\bur[il]|ur[il]s?$/i is the set of custom attribute names with URL
values.
Substitutions in Trusted Resource URLs
Values that are substituted in Trusted Resource URIs are almost same as in URIs except that the value needs to be TrustedResourceUrl.
<script src="{$x}"> |
Entity-escape and filter out non-TrustedResourceUri. | ||
({ "x": "foo") }) |
→ | <script src="about:invalid#zSoyz"> |
|
({ "x": goog.html.TrustedResourceUrl.
|
→ | <script src="http://foo/"> |
|
({ "x": goog.html.TrustedResourceUrl.
|
→ | <script src="/foo?a=b&c=d"> |
|
({ "x": goog.html.TrustedResourceUrl.
|
→ | <script src="javascript:alert(1337)"> |
|
<script src="/foo/{$x}"> |
Entity-escape and filter out non-TrustedResourceUri. | ||
({ "x": goog.html.TrustedResourceUrl.
|
→ | <script src="/foo/bar"> |
|
({ "x": goog.html.TrustedResourceUrl.
|
→ | <script src="/foo/bar&baz/boo"> |
|
<script src="/foo?q={$x}"> |
Entity-escape and filter out non-TrustedResourceUri. | ||
({ "x": goog.html.TrustedResourceUrl.
|
→ | <script src="/foo?q=bar&baz=boo"> |
|
({ "x": goog.html.TrustedResourceUrl.
|
→ | <script src="/foo?q=A is #1"> |
|
Substitutions in JavaScript
Values in JavaScript that are inside quotes are dealt with differently from those outside quotes.
<script>alert('{$x}');</script> |
Escaped inside quotes. | ||
({ "x": "O'Reilly Books" }) |
→ | <script>alert('O\'Reilly Books');</script> |
|
({ "x": new soydata.SanitizedJsStrChars( |
→ | <script>alert('O\'Reilly Books');</script> |
|
<script>alert({$x});</script> |
Without quotes, treated as a value. | ||
({ "x": "O'Reilly Books" }) |
→ | <script>alert('O\'Reilly Books');</script> |
|
({ "x": 42 }) |
→ | <script>alert( 42 );</script> |
|
({ "x": true }) |
→ | <script>alert( true );</script> |
|
Substitutions in CSS
Values in CSS can be parts of classes, IDs, quantities, colors, or URLs.
<style>div#{$id} {lb} {rb}</style> |
Classes and IDs | ||
({ "id": "foo-bar" }) |
→ | <style>div#foo-bar { }</style> |
|
<div style="color: {$x}"> |
Quantities | ||
({ "x": "red" }) |
→ | <div style="color: red"> |
|
({ "x": "#f00" }) |
→ | <div style="color: #foo"> |
|
({ "x": "expression('alert(1337)')" }) |
→ | <div style="color: zSoyz"> |
|
<div style="margin-{$ltr-dir}: 1em"> |
Property Names | ||
({ "ltr-dir": "left" }) |
→ | <div style="margin-left: 1em"> |
|
({ "ltr-dir": "right" }) |
→ | <div style="margin-right: 1em"> |
|
<style>p {lb} font-family: '{$x}' {rb}</style> |
Quoted Values | ||
({ "x": "Arial" }) |
→ | <style>p { font-family: 'Arial' }</style> |
|
({ "x": "</style>" }) |
→ | <style>p { font-family: '\3c \2f style\3e ' }</style> |
|
<div style="background: url({$x})"> |
URLs in CSS are handled as in attributes above | ||
({ "x": "/foo/bar" }) |
→ | <div style="background: url(/foo/bar)"> |
|
({ "x": "javascript:alert(1337)" }) |
→ | <div style="background: url(#zSoyz)"> |
|
({ "x": "?q=(O'Reilly) OR Books" }) |
→ | <div style="background: url(?q=%28O%27Reilly%29%20OR%20Books)"> |
|
Print Directives
Autoescaping works by automatically adding
print directives to templates, so
you can remove the print directives that you explicitly added, including
|escapeJs, |escapeUri, |escapeHtml, and especially those
dangerous |noAutoescape directives.
In case you have defined custom
print directives, the autoescaper does not
interfere with any
{print …} command containing a directive that returns true
from shouldCancelAutoescape(). Thus, if the escape directive transforms plain text
to the expected content type, then override shouldCancelAutoescape() to
return true. If your custom directive expects already-escaped input instead of plain
text, you can implement SanitizedContentOperator to get the autoescaper to insert
escaping directives before your directive so they produce the already-escaped input and
pipe it to your directive.
Guarantees
Autoescaping augments Closure Templates to choose an appropriate encoding for each dynamic value so that even if a particular dynamic value can be controlled by an attacker, certain safety properties hold.
Specifically, if a template, and all the templates that it calls have
autoescape="deprecated-contextual" or autoescape="strict", and have no
manual escaping overrides such as |noAutoescape, then the following properties hold:
Structure is preserved
If you, the Closure Templates author, write
<b>{$x}</b>, then the
tags <b> and
</b> always correspond to matched tags in the template output
regardless of the value of $x.
No dynamic value can change the meaning of an HTML, CSS, or JavaScript token in the template, or correspondences between pairs of matched tokens.
Only code in the template is executed
Dynamic values cannot specify unsafe code. Any code hidden in dynamic values (whether via
<script> elements, javascript: URIs, or some
other mechanism) are treated as plain text and encoded properly on output instead of being
rendered as code.
Dynamic values that appear in JavaScript (e.g. $message in
<script>alert('{$message}')</script>) are encoded to
expressions without side effects or free variables (to preserve privacy constraints).
Given { "message": "'//\ndoEvil()//" }, the template
produces
<script>alert('\x27//\ndoEvil()//');</script>, which
alerts the garbage string passed in instead of calling doEvil.
All code in the template is executed
A dynamic value cannot cause code to fail to parse. Some applications have security-critical code that they need to run if JavaScript is enabled. Take for example the following template:
<script>
var s = '{$s}';
doSecurityCriticalStuff();
</script>
If the value of the variable s is a newline character "\n", then a non-autoescaped
template would produce the following output:
<script> var s = ' '; doSecurityCriticalStuff(); </script>
The autoescaped version of the template instead produces:
<script> var s = '\n'; doSecurityCriticalStuff(); </script>
which parses properly.
If a template or the templates that it calls do not have autoescaping enabled, or use explicit
escaping directives like |noAutoescape incorrectly, then the autoescaper makes a best
effort to preserve these properties but might fail.
Content Security Policy
Closure Templates has an optional pass that supports
Content Security Policy nonces. CSP nonces are a
defense-in-depth technique for restricting the execution of <script> and
<style> blocks. With CSP nonces, even if an attacker can inject scripts into
your document, they will be unable to execute unless they can also guess the CSP nonce.
(See this
article for a good overview.)
When CSP nonces are enabled in Closure Templates, autoescaped templates
have nonce="..." added to <script> and
<style> elements declared inside them:
<script>...</script>
becomes
<script
{if $ij.csp_nonce} nonce="{$ij.csp_nonce}"{/if}>...</script>
There are three steps to configuring CSP nonces with Closure Templates:
- Configure your web server to compute nonces and send them in CSP response headers.
- Configure Closure Templates to add nonces to autoescaped templates.
- Make the nonces computed in step 1 available to the templates from step 2.
Step 1 is outside the scope of this document. General considerations for nonces include generating strong random numbers ( article) and not reusing nonces ( article).
Step 2 is backend specific: see Tofu and JavaScript below.
For step 3, render with an
injected data bundle that includes
an $ij.csp_nonce value that is a
valid nonce
.
Tofu
Enable CSP support by calling
mySoyFileSetBuilder
.getGeneralOptions()
.setSupportContentSecurityPolicy(true)
JavaScript
Pass the --supportContentSecurityPolicy=true command line flag
to SoyToJsSrcCompiler to enable CSP support. Enabling this will increase the size
of generated code for templates that include embedded scripts or styles.