Expanded URLs 2.0
With the Expanded URLs enrichment, Gnip will automatically expand shortened URLs that are included in the original payload of the activity, and include the resulting URL as an additional piece of data within the payload. Additionally, Gnip will extract key HTML data (page title and description) from the resulting URL and include it in the payload.
Expanded URL data will be included in Gnip’s PowerTrack, Replay, Volume Stream, Search, and Historical PowerTrack APIs.
Expanded URL Data
| Original Format Field Name | Activity Streams Field Name | Example Value | Description |
|---|---|---|---|
| entities.urls.url | gnip.urls.url | https:\/\/t.co\/b9ZdzRxzFK | The shortened URL that is encoded as a t.co when the Tweet is created. |
| entities.url.unwound.url | gnip.urls.expanded_url | http:\/\/www.today.com\/parents\/joke-s-you-kid-11-family-friendly-april-fools-pranks-t83276 | The expanded URL, or the final destination that our URL resolution process was able to reach. |
| entities.urls.unwound.status | gnip.urls.expanded_status | 200 | The HTTP status code for the final destination that our URL resolution process was able to reach. |
| entities.urls.unwound.title | gnip.urls.expanded_url_title | The joke's on you, kid: 11 family-friendly April Fools pranks | The HTML title from the final destination that our URL resolution process was able to reach. |
| entities.urls.unwound.description | gnip.urls.expanded_url_description | If your kids are practical jokers, turn this April Fools' Day into a family affair. | The HTML description from the final destination that our URL resolution process was able to reach. |
The Historical PowerTrack, Search, and PowerTrack APIs support filtering based on Expanded URL data. See the appropriate product documentation for more details on what operators are available for filtering on Expanded URLs data.
HTTP Status Codes
The expanded URL enrichment also provides the HTTP status code for the final URL we are attempting to unwind. In normal cases, this will be a 200 value. Other 400-series values indicate problems with resolving the URL.
Various status codes may be returned when attempting to unwind a URL. During the process of unwinding a URL, if we get a redirect, we will follow them indefinitely until we either:
- Hit a 200 series code (success)
- Hit a non-redirect series code (failures)
- Time out because the final URL could not be resolved in a reasonable amount of time (returns a 408 - timeout)
- Hit an exception of some sort
If an exception is hit, we use the following mapping between reasons and status codes returned:
| Reason | Status Code Returned |
|---|---|
| SSL Exceptions | 403 (Forbidden) |
| Unwinding not allowed by URL | 405 |
| Socket Timeout | 408 (Timeout) |
| Unknown Host Exception | 404 (Not Found) |
| Unsupported Operation | 404 (Not Found) |
| Connect Exception | 404 (Not Found) |
| Illegal Argument | 400 (Bad Request) |
| Everything else | 400 (Bad Request) |
Sample Payload
In Original Format, expanded URL data will be included in the entities.urls.unwound section of the payload.
{
"entities": {
"urls": [
{
"url": "https: //t.co/b9ZdzRxzFK",
"expanded_url": "http: //www.today.com/parents/joke-s-you-kid-11-family-friendly-april-fools-pranks-t83276",
"display_url": "today.com/parents/joke-s…",
"unwound": {
"url": "http: //www.today.com/parents/joke-s-you-kid-11-family-friendly-april-fools-pranks-t83276",
"status": 200,
"title": "The joke is on you kid: 11 family-friendly April Fools pranks",
"description": "If your kids are practical jokers, turn this April Fools' Day into a family affair."
},
"indices": [
43,
66
]
}
]
}
}
In Activity Streams Format, expanded URL data will be included in the gnip.urls section of the payload.
{
"gnip": {
"urls": [
{
"url": "https://t.co/b9ZdzRxzFK",
"expanded_url": "http://www.today.com/parents/joke-s-you-kid-11-family-friendly-april-fools-pranks-t83276",
"expanded_status": 200,
"expanded_url_title": "The joke's on you, kid: 11 family-friendly April Fools pranks",
"expanded_url_description": "If your kids are practical jokers, turn this April Fools' Day into a family affair."
}
]
}
}
FAQ
To resolve a shortened link as described above, our system sends HTTP HEAD requests to the URL provided, and follows any redirects until it arrives at the final URL. This URL (NOT the content of the page itself) is then included in the data payloads we send to our customers.
For requests made to the Full Archive Search API, we currently only support expanded URL data for Tweets 13 months old or newer.