Planet Python

Last update: November 14, 2016 10:47 AM

November 13, 2016

I’ve noticed that several of the Django functions, especially filter, take a kwargs-style set of arguments. I casually wondered about this but didn’t give it much thought until now. I ran into a very practical use for it, that should have been obvious, but wasn’t to me.

The Setup

In my recent redesign of the KidTasks project, I opted to take the easy route for tasks which repeat on particular weekdays and simply put a BooleanField for each day of the week into the model. This makes forms a bit easier but does cause some complications when trying to use a filter function to get a list of weekday-task pairs. I started with the naive (and un-good) method of filtering each weekday independently:

# note: syntax on the filter here is from memory - might be incorrect!
qs = RepeatingTask.objects.filter(kid=self).filter('monday'=True)
tasks.append((day, [task for task in qs]))
qs = RepeatingTask.objects.filter(kid=self).filter('tuesday'=True)
tasks.append((day, [task for task in qs]))
    [etc]

While that will work, I’m sure you’ll agree that it’s ugly.

Kwargs to the rescue!

It turns out that the solution to this problem lies with those mysterious kwargs function signatures. It turns out that filter takes one. This allows you to create a dict which you can create on the fly. Which, in turn, allows you to use a variable to define the field on which you’re filtering!

days = [
    'monday',
    'tuesday',
    'wednesday',
    'thursday',
    'friday',
    'saturday',
    'sunday'
]

...

for day in self.days:
    qs = RepeatingTask.objects.filter(kid=self).filter(**{ day : True })
    tasks.append((day, [task for task in qs]))

Much nicer. While this is likely obvious to most seasoned Django users, I thought it tied things together nicely.

November 13, 2016 03:45 PM

Jeff Knupp

How Python Makes Working With Data More Difficult in the Long Run

Before we begin, let's be clear on terminology. When I refer to "working with data" in the context of software development I could mean one of two things:

Interactively working with data, with perhaps Jupyter (née IPython) or the live interpreter
Writing, testing, reading, reviewing and maintaining programs that primarily manipulate data

In short: Python is awesome for interactive data analysis but terrible for writing long-lived programs dealing with complicated data structures. The second definition is perhaps overly broad, but I'll clarify in a minute. Before that, let me be the first to say that Python is an incredible language for interactively working with, or exploring, data. The ecosystem of third-party packages and tools that have sprung up around data manipulation, visualization, and data science in general has been nothing short of remarkable.

If working with interactive data is your nail, Python should be your hammer.

But what about that second interpretation? Actually, it can be thought of as a logical extension of the first. Imagine you're writing a program to query a database for a search term, do some sentiment analysis, and return the results in JSON. Working interactively with the database results, the results returned by your sentiment analysis library, and the JSON you produce is the natural first step. You're still in "exploration mode". Not really writing the program yet, just seeing what the data looks like and how you'll need to manipulate it.

Once you get a "feel" for the "shape" of the data at each step, you can begin to write your program. You'll likely refer back to examples of the output you created during exploration when implementing the logic of your program. Particularly with deeply nested data structures (I'm looking at you, "everyone's abuse of JSON..."), it's often too difficult to keep the "shape" of the data at each stage in your head.

But Python makes working with data easy, so your program is finished in no time. It works, it's well-documented, and even has 100% test coverage. If you never need to return to this code, huzzah! Your job is done.

Dynamic Typing Is The Root Of All Evil (j/k...kind of...)

The very property of Python that made your program so easy to write is the same one that will make it difficult to review, read, and (most importantly) maintain. Python's dynamic type system means that, in most cases, you don't have to enumerate the complete set of fields, types, and value constraints that define the data as it moves through your system. You can just jam it all in a dict! Heterogeneous values FTW!

The task above would be much more laborious and time-consuming in a statically typed language like C or Go. In Go, for example, to parse and return a JSON response from some web API, you first need to create a struct whose fields and field-types exactly match the structure of the response. Here is how one must prepare to work with a JSON response from etcd (taken from their client library):

type Response struct {
    // Action is the name of the operation that occurred. Possible values
    // include get, set, delete, update, create, compareAndSwap,
    // compareAndDelete and expire.
    Action string `json:"action"`

    // Node represents the state of the relevant etcd Node.
    Node *Node `json:"node"`

    // PrevNode represents the previous state of the Node. PrevNode is non-nil
    // only if the Node existed before the action occurred and the action
    // caused a change to the Node.
    PrevNode *Node `json:"prevNode"`

    // Index holds the cluster-level index at the time the Response was generated.
    // This index is not tied to the Node(s) contained in this Response.
    Index uint64 `json:"-"`
}

type Node struct {
    // Key represents the unique location of this Node (e.g. "/foo/bar").
    Key string `json:"key"`

    // Dir reports whether node describes a directory.
    Dir bool `json:"dir,omitempty"`

    // Value is the current data stored on this Node. If this Node
    // is a directory, Value will be empty.
    Value string `json:"value"`

    // Nodes holds the children of this Node, only if this Node is a directory.
    // This slice of will be arbitrarily deep (children, grandchildren, great-
    // grandchildren, etc.) if a recursive Get or Watch request were made.
    Nodes Nodes `json:"nodes"`

    // CreatedIndex is the etcd index at-which this Node was created.
    CreatedIndex uint64 `json:"createdIndex"`

    // ModifiedIndex is the etcd index at-which this Node was last modified.
    ModifiedIndex uint64 `json:"modifiedIndex"`

    // Expiration is the server side expiration time of the key.
    Expiration *time.Time `json:"expiration,omitempty"`

    // TTL is the time to live of the key in second.
    TTL int64 `json:"ttl,omitempty"`
}

The "json:..." part after each field describes what that field's name should be when the object is marshaled from a JSON message. And notice that, because Response contains a nested object (Node), we must fully define that nested object as well.

Note: to be fair, there are some shortcuts one might take in Go to reduce the need for a portion of the above, but they're rarely taken (and for good reason).

In Python, you'd be all like:

    result = make_etcd_call("some", "arguments", "here")

If you wanted to see if the node in question was a directory, you'd pound this out:

    if result.json()['node']['dir']:
        # make magic happen...

And the Python version is less code and takes less time to write than the Go version.

"I Don't See The Problem"

The Python version is better, right? Let's consider two definitions of "good code" so we can be clear what we mean by better.

Code that is short, concise, and can be written quickly
Code that is maintainable

If we're using the first definition, the Python version is "better". If we're using the second, it's far, far worse. The Go version, despite containing a boatload of boilerplate-ish definition code, makes clear the exact structure of the data we can expect in result.

Boss: "What can you tell me about the Python version, just by looking at our code above?" Me: "Uh, it's JSON and has a 'node' object which probably has a 'dir' field." Boss: "What type of value is in dir? Is it a boolean, a string, a nested object? Me: "Uh, I dunno. It's truthy, though!" Boss: "So is everything else in Python. Is dir guaranteed to be part of the node object in the response?" Me: "Uh...."

And I've met my "3-Uh" limit for describing what a portion of code does. If you refer to the Go version, you can answer those questions and sound like a damned genius in comparison. But these are exactly the sort of questions your peers should be asking in a code review. The answers to the questions in the Go version are self-evident. The answers for the Python version, not so much...

Making Changes

What happens when we need to make a change to the Python version? Perhaps we want to say "only make magic happen if the directory was just created, not for every response with a directory?"

It's pretty clear how to do that in the Go version. Compared to the Python version, the Go version is like the Library of Alexandria of etcd Responses. For the Python version, we have nothing local to refer to in order to figure out the structure of result and the change we need to make. We'll have to go look up the etcd HTTP API documentation. Let's hope that:

it exists
it is well maintained
the tubes aren't clogged

And this is a very simple change we're talking about on a very simple JSON object. I could tell horror stories about what happens when you get knee-deep in Elasticsearch JSON responses... (spoiler alert: response['hits_']['hits_']['hits_']...). The fun doesn't stop at just making the code change, though. Remember, we're professionals, so all of our code is peer reviewed and unit-tested. After updating the code correctly we can still barely reason about it. All of a sudden, we're back to that conversation between my boss and I where I say "Uh" a lot and he wonders why he didn't go into carpentry.

Everybody Panic!

I've painted a rather bleak picture of using Python to manipulate complex (and even not-so-complex) data structures in a maintainable way. In truth, however, it's a shortcoming shared by most dynamic languages. In the second half of this article, I'll describe what various people/companies are doing about it, from simple things like the movement towards "live data in the editor" all the way to the Dropboxian "type-annotate all the things". In short, there's a lot of interesting work going on in this space and lot's of people are involved (notice the second presenter name in that Dropbox deck ).

November 13, 2016 02:46 PM

PyPy Development

PyPy2.7 v5.6 released - stdlib 2.7.12 support, C-API improvements, and more

We have released PyPy2.7 v5.6 [0], about two months after PyPy2.7 v5.4. This new PyPy2.7 release includes the upstream stdlib version 2.7.12.

We continue to make incremental improvements to our C-API compatibility layer (cpyext). We pass all but 12 of the over-6000 tests in the upstream NumPy test suite, and have begun examining what it would take to support Pandas and PyQt.

Work proceeds at a good pace on the PyPy3.5 version due to a grant from the Mozilla Foundation, and some of those changes have been backported to PyPy2.7 where relevant.

The PowerPC and s390x backend have been enhanced with the capability to use SIMD instructions for micronumpy loops.

We changed timeit to now report average +/- standard deviation, which is better than the misleading minimum value reported in CPython.

We now support building PyPy with OpenSSL 1.1 in our built-in _ssl module, as well as maintaining support for previous versions.

CFFI has been updated to 1.9, improving an already great package for interfacing with C.

As always, this release fixed many issues and bugs raised by the growing community of PyPy users. We strongly recommend updating. You can download the PyPy2.7 v5.6 release here:

http://pypy.org/download.html

Downstream packagers have been hard at work. The Debian package is already available, and the portable PyPy versions are also ready, for those who wish to run PyPy on other Linux distributions like RHEL/Centos 5.

We would like to thank our donors for the continued support of the PyPy project.

We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on pypy, or general help with making RPython’s JIT even better.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It’s fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython can do for them.
This release supports:

x86 machines on most common operating systems (Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)

newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,

big- and little-endian variants of PPC64 running Linux,

s390x running Linux

What else is new?

(since the release of PyPy 5.4 in August, 2016)

There are many incremental improvements to RPython and PyPy, the complete listing is here.

Please update, and continue to help us make PyPy better.

Cheers, The PyPy team

[0] We skipped 5.5 since we share a code base with PyPy3, and PyPy3.3-v.5.5-alpha was released last month

November 13, 2016 10:32 AM

Bhishan Bhandari

Hello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start writing!

November 13, 2016 09:56 AM

Vasudev Ram

Trapping KeyboardInterrupt and EOFError for program cleanup

By Vasudev Ram

Ctrl-C and Ctrl-Z handling

I had written this small Python utility for my own use, to show the ASCII code for any input character typed at the keyboard. Since it was a quick utility, I was initially just using Ctrl-C to exit the program. But that leaves behind a messy traceback, so I thought of trapping the exceptions KeyboardInterrupt (raised by Ctrl-C) and EOFError (raised by Ctrl-Z). With that, the program now exits cleanly on typing either of those keys.

Here is the resulting utility, char_to_ascii_code.py:

from __future__ import print_function
"""
char_to_ascii_code.py
Purpose: Show ASCII code for a given character, interactively, 
in a loop. Show trapping of KeyboardInterrupt and EOFError exceptions.
Author: Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
"""

print("This program shows the ASCII code for any given ASCII character.")
print("Exit the program by pressing Ctrl-C or Ctrl-Z.")
print()

while True:
    try:
        c = raw_input( \
        "Enter an ASCII character to see its ASCII code: ")
        if len(c) != 1:
            print("Error: need a string of length 1; retry.")
            continue
        print("Character:", c)
        print("Code:", ord(c))
    except KeyboardInterrupt as ki:
        print("Caught:", repr(ki))
        print("Exiting.")
        break
    except EOFError as eofe:
        print("Caught:", repr(eofe))
        print("Exiting.")
        break

Here is a sample run, that shows the ASCII codes for the comma, tab and pipe characters, which are commonly used as field delimiters in Delimiter-Separated Value (DSV) files.

$ python char_to_ascii_code.py
This program shows the ASCII code for any given ASCII character.
Exit the program by pressing Ctrl-C or Ctrl-Z.

Enter an ASCII character to see its ASCII code, or Ctrl-C to exit: ,
Character: ,
Code: 44
Enter an ASCII character to see its ASCII code, or Ctrl-C to exit:
Character:
Code: 9
Enter an ASCII character to see its ASCII code, or Ctrl-C to exit: |
Character: |
Code: 124
Enter an ASCII character to see its ASCII code, or Ctrl-C to exit: 
Caught: KeyboardInterrupt()
Exiting.
$

I pressed the Ctrl-C key combination to exit the program. Ctrl-C does not show on the screen, but the exception handler for it is activated, and prints the last message above.

Another run shows a few more codes and the trapping of the Ctrl-Z key combination.

$ python char_to_ascii_code.py
This program shows the ASCII code for any given ASCII character.
Exit the program by pressing Ctrl-C or Ctrl-Z.

Enter an ASCII character to see its ASCII code, or Ctrl-C to exit: !
Character: !
Code: 33
Enter an ASCII character to see its ASCII code, or Ctrl-C to exit: ~
Character: ~
Code: 126
Enter an ASCII character to see its ASCII code, or Ctrl-C to exit: ^Z
Caught: EOFError()
Exiting.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Subscribe to my blog by email

My ActiveState recipes

Share |

November 13, 2016 03:49 AM

November 12, 2016

Dougal Matthews

Create an Excellent Python Dev Env

There are a huge number of Python dev tools around, a number of them are essential for my day to day development. However, they tend to suffer from a lack of discoverability and it takes a while to find what works for you.

I'm going to quickly share what I use, some of these are well known, some are probably not. I'd expect most people to pick and choose from this post as you are unlikely to want everything I use, but there should be something useful for most people.

These are my primary goals:

Be able to install any Python version easily.
Don't ever touch the system Python.
An easy way to setup virtualenvs for specific projects.
Install and isolate a number of Python tools.

How do we get there?

pyenv

pyenv pitches itself as "simple python version management" and it does just that. Once setup, you can easily install and switch between Python versions, including specific point releases. pyenv install --list reveals it knows how to install a whopping 271 different Python versions at the moment from cpython 2.1.3 up to 3.7-dev and pypy and stackless.

The install process is a bit manual, but there is an install tool that makes it easier. After installing, I do something like this:

pyenv install -s 2.7.12;
pyenv install -s 3.5.2;
pyenv install -s 3.4.5;
pyenv install -s pypy-5.4.1;
pyenv global 2.7.12 3.5.2 3.4.5 pypy-5.4.1;

This installs the Python versions I typically need, and then sets them as the global default. The order is important, 2.7.12 becomes the default for python as it is first and 3.5.2 becomes the default for python3.

If you just want to use a specific Python version in a directory, and it's subdirectories, you can run the command pyenv local 3.5.2 and it will create a .python-version file. Warning, if you do this in your home directory by mistake it can be very confusing.

One feature I'd love pyenv to have, is a way to tell it to install a Python version (like 2.7 or 3.5) and have it automatically install the latest point release (and add a new command that removes and updates them when needed)

pyenv-virtualenv

For a long time I was a big user of virtualenvwrapper, however, my transition to pyenv and fish caused some issues. I stumbled on pyenv-virtualenv (not to be mistaken with pyenv-virtualenvwrapper which also doesn't support fish) which covers all my needs. I wrote a few fish functions to make it a little easier to use. It isn't hard, but maybe just a little verbose.

For example, here is a handy way to make a temporary virtualenv, I found this feature of virtualenvwrapper (the mktmpenv command) particularly useful.

function venv-tmp
  set venv_tmp_name "tmp-"(random)
  pyenv virtualenv (expr substr (python --version 2>&1) 8 20) $venv_tmp_name
  venv-activate $venv_tmp_name
end

function venv-tmp-cleanup
  for val in (pyenv versions | grep "/envs/tmp-")
    venv-rm (basename $val)
  end
end

Generally it doesn't give me much over what virtualenvwrapper did (other than fish support) but I do like that it is managed by pyenv and integrates well.

pipsi

pipsi is a more recent addition to my setup. It is a fairy simple tool which allows you to install Python CLI tools in their own virtualenv and then the command is added to your path. The main advantage here is that they are all isolated and don't need to have compatible requirements. Uninstalling is also much cleaner and easier - you just delete the virtualenv.

I install a bunch of Python projects this way, here are some of the most useful.

tox: My defacto way of running tests.
mkdocs: A beautifully simple documentation tool (I might be biased).
git-review: The git review command for gerrit integration.
flake8: Python linting, mostly installs like this for vim.

Putting it all together

So, overall I don't actually use that many projects, but I have very happy with how it works. I have the setup automated, and it looks like this.

# pyenv
if [ ! -d ~.pyenv ]; then
    curl -L https://raw.githubusercontent.com/yyuu/pyenv-installer/master/bin/pyenv-installer | bash
    git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv
else
    pyenv update
fi;

pyenv install -s 2.7.12;
pyenv install -s 3.5.2;
pyenv install -s 3.4.5;
pyenv install -s pypy-5.4.1;
pyenv global 2.7.12 3.5.2 3.4.5 pypy-5.4.1;
~/.pyenv/shims/pip install -U pip pipsi
rm -rf ~/.local/venvs
~/.pyenv/shims/pipsi install tox
~/.pyenv/shims/pipsi install mkdocs
~/.pyenv/shims/pipsi install git-review
~/.pyenv/shims/pipsi install 1pass
~/.pyenv/shims/pipsi install flake8
~/.pyenv/shims/pipsi install yaql
~/.pyenv/shims/pipsi install livereload

The summary is, first install pyenv and setup the Python versions you need. Then install pipsi into the default pyenv environment and use that to install the other tools. The system Python should never be touched.

A couple of things are missing as you'll need to setup paths and so on, so please do look at the install guides for each.

November 12, 2016 03:45 PM

Weekly Python StackOverflow Report

(xlv) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2016-11-12 08:53:57 GMT

November 12, 2016 08:54 AM

Lintel Technologies

FCM – send push notifications using Python

What is FCM ?

FCM – Firebase Cloud Messaging is a cross-platform ( Android, iOS and Chrome ) messaging solution that lets you reliably deliver messages at no cost. FCM is best suited if you want to send push notification to your app which you built to run on Android and iOS. The advantage you get is you don’t have to separately deal with GCM (Google Cloud Messaging deprecated now) and Apple’s APNS. You hand over your notification message to FCM and FCM takes care of communicating with apple’s APNS and Android messaging servers to reliably deliver those messages.

Using FCM we can send message to single device or multiple devices. There are two different types of messages, notification and data. Notification messages include JSON keys that are understood and interpreted by phone’s operating system. If you want to include customized app specific JSON keys use data message. You can combine both notification and data JSON objects in single message. You can also send messages with different priority.

Note : – You need to set

priority

high

if you want phone to wake up and show notification on screen

Sending message with Python

We can use PyFCM to send messages via FCM. PyFCM is good for synchronous ( blocking ) python. We will discuss non-blocking option in next paragraph.

Install PyFCM using following command

pip install pyfcm

The following code will send a push notification to

from pyfcm import FCMNotification

push_service = FCMNotification(api_key="<api-key>")

# OR initialize with proxies

proxy_dict = {
          "http"  : "http://127.0.0.1",
          "https" : "http://127.0.0.1",
        }
push_service = FCMNotification(api_key="<api-key>", proxy_dict=proxy_dict)

# Your api-key can be gotten from:  https://console.firebase.google.com/project/<project-name>/settings/cloudmessaging

registration_id = "<device registration_id>"
message_title = "Uber update"
message_body = "Hi john, your customized news for today is ready"
result = push_service.notify_single_device(registration_id=registration_id, message_title=message_title, message_body=message_body)

print result
 
# Send to multiple devices by passing a list of ids.
registration_ids = ["<device registration_id 1>", "<device registration_id 2>", ...]
message_title = "Uber update"
message_body = "Hope you're having fun this weekend, don't forget to check today's news"
result = push_service.notify_multiple_devices(registration_ids=registration_ids, message_title=message_title, message_body=message_body)

print result

So, the PyFCM API is the pretty straight forward to use.

Sending FCM push notification using Twisted

PyFCM discussed in above paragraph is good enough if you want to send messages in blocking fashion. If you have to send high number of concurrent messages then using Twisted is a good option.

Twisted Matrix

Network operations performed using twisted library don’t block. Thus it’s a good choice when network concurrency is required by program. We can use txFCM library to send FCM messages using twisted

Install txFCM using following command

pip install txfcm

Following code send FCM message using txFCM

from txfcm import TXFCMNotification
from twisted.internet import reactor

push_service = TXFCMNotification(api_key="<api-key>")

# Your api-key can be gotten from:  https://console.firebase.google.com/project/<project-name>/settings/cloudmessaging
# Send to multiple devices by passing a list of ids.
registration_ids = ["<device registration_id 1>", "<device registration_id 2>", ...]
message_title = "Uber update"
message_body = "Hope you're having fun this weekend, don't forget to check today's news"
df = push_service.notify_multiple_devices(registration_ids=registration_ids, message_title=message_title, message_body=message_body)

def got_result(result):
    print result

df.addBoth(got_result)
reactor.run()

txFCM is built on top of PyFCM so all the API call that are available in PyFCM are also available in txFCM.

The post FCM – send push notifications using Python appeared first on Lintel Technologies Blog.

November 12, 2016 07:37 AM

Glyph Lefkowitz

What are we afraid of?

I’m crying as I write this, and I want you to understand why.

Politics is the mind-killer. I hate talking about it; I hate driving a wedge between myself and someone I might be able to participate in a coalition with, however narrow. But, when you ignore politics for long enough, it doesn't just kill the mind; it goes on to kill the rest of the body, as well as anyone standing nearby. So, sometimes one is really obligated to talk about it.

Today, I am in despair. Donald Trump is an unprecedented catastrophe for American politics, in many ways. I find it likely that I will get into some nasty political arguments with his supporters in the years to come. But hopefully, this post is not one of those arguments. This post is for you, hypothetical Trump supporter. I want you to understand why we¹ are not just sad, that we are not just defeated, but that we are in more emotional distress than any election has ever provoked for us. I want you to understand that we are afraid for our safety, and for good reason.

I do not believe I can change your views; don’t @ me to argue, because you certainly can’t change mine. My hope is simply that you can read this and at least understand why a higher level level of care and compassion in political discourse than you are used to may now be required. At least soften your tone, and blunt your rhetoric. You already won, and if you rub it in too much, you may be driving people to literally kill themselves.

First let me list the arguments that I’m not making, so you can’t write off my concerns as a repeat of some rhetoric you’ve heard before.

I won’t tell you about how Trump has the support of the American Nazi Party and the Ku Klux Klan; I know that you’ll tell me that he “can’t control who supports him”, and that he denounced² their support. I won’t tell you about the very real campaign of violence that has been carried out by his supporters in the mere days since his victory; a campaign that has even affected the behavior of children. I know you don’t believe there’s a connection there.

I think these are very real points to be made. But even if I agreed with you completely, that none of this was his fault, that none of this could have been prevented by his campaign, and that in his heart he’s not a hateful racist, I would still be just as scared.

Bear Sterns estimates that there are approximately 20 million illegal immigrants in the United States. Donald Trump’s official position on how to handle this population is mass deportation. He has promised that this will be done “warmly and humanely”, which betrays his total ignorance of how mass resettlements have happened in the past.

By contrast, the total combined number of active and reserve personnel in the United States Armed Forces is a little over 2 million people.

What do you imagine happens when a person is deported? A person who, as an illegal immigrant, very likely gave up everything they have in their home country, and wants to be where they are so badly that they risk arrest every day, just by living where they live? How do you think that millions of them returning to countries where they have no home, no food, and quite likely no money or access to the resources or support that they had while in the United States?

They die. They die of exposure because they are in poverty and all their possessions were just stripped away and they can no longer feed themselves, or because they were already refugees from political violence in their home country, or because their home country kills them at the border because it is a hostile action to suddenly burden an economy with the shock of millions of displaced (and therefore suddenly poor and unemployed, whether they were before or not) people.

A conflict between 20 million people on one side and 2 million (heavily armed) people on the other is not a “police action”. It cannot be done “warmly and humanely”. At best, such an action could be called a massacre. At worst (and more likely) it would be called a civil war. Individual deportees can be sent home without incident, and many have been, but the victims of a mass deportation will know what is waiting for them on the other side of that train ride. At least some of them won’t go quietly.

It doesn’t matter if this is technically enforcing “existing laws”. It doesn’t matter whether you think these people deserve to be in the country or not. This is just a reality of very, very large numbers.

Let’s say, just for the sake of argument, that of the population of immigrants has assimilated so poorly that each one knows only one citizen who will stand up to defend them, once it’s obvious that they will be sent to their deaths. That’s a hypothetical resistance army of 40 million people. Let’s say they are so thoroughly overpowered by the military and police that there are zero casualties on the other side of this. Generously, let’s say that the police and military are incredibly restrained, and do not use unnecessary overwhelming force, and the casualty rate is just 20%; 4 out of 5 people are captured without lethal force, and miraculously nobody else dies in the remaining 16 million who are sent back to their home countries.

That’s 8 million casualties.

6 million Jews died in the Holocaust.

This is why we are afraid. Forget all the troubling things about Trump’s character. Forget the coded racist language, the support of hate groups, and every detail and gaffe that we could quibble over as the usual chum of left/right political struggle in the USA. Forget his deeply concerning relationship with African-Americans, even.

We are afraid because of things that others have said about him, yes. But mainly, we are afraid because, in his own campaign, Trump promised to be 33% worse than Hitler.

I know that there are mechanisms in our democracy to prevent such an atrocity from occurring. But there are also mechanisms to prevent the kind of madman who would propose such a policy from becoming the President, and thus far they’ve all failed.

I’m not all that afraid for myself. I’m not a Muslim. I am a Jew, but despite all the swastikas painted on walls next to Trump’s name and slogans, I don’t think he’s particularly anti-Semitic. Perhaps he will even make a show of punishing anti-Semites, since he has some Jews in his family³.

I don’t even think he’s trying to engineer a massacre; I just know that what he wants to do will cause one. Perhaps, when he sees what is happening as a result of his orders, he will stop. But his character has been so erratic, I honestly have no idea.

I’m not an immigrant, but many in my family are. One of those immigrants is intimately familiar with the use of the word “deportation” as an euphemism for extermination; there’s even a museum about it where she comes from.

Her mother’s name is written in a book there.

In closing, I’d like to share a quote.

The last thing that my great-grandmother said to my grandmother, before she was dragged off to be killed by the Nazis, was this:

Pleure pas, les gens sont bons.

or, in English:

Don’t cry, people are good.

As it turns out, she was right, in a sense; thanks in large part to the help of anonymous strangers, my grandmother managed to escape, and, here I am.

My greatest hope for this upcoming regime change is that I am dramatically catastrophizing; that none of these plans will come to fruition, that the strange story⁴ I have been told by Trump supporters is in fact true.

But if my fears, if our fears, should come to pass – and the violence already in the streets already is showing that at least some of those fears will – you, my dear conservative, may find yourself at a crossroads. You may see something happening in your state, or your city, or even in your own home. Your children might use a racial slur, or even just tell a joke that you find troubling. You may see someone, even a policeman, beating a Muslim to death. In that moment, you will have a choice: to say something, or not. To be one of the good people, or not.

Please, be one of the good ones.

In the meanwhile, I’m going to try to take great-grandma’s advice.

When I say “we”, I mean, the people that you would call “liberals”, although our politics are often much more complicated than that; the people from “blue states” even though most states are closer to purple than pure blue or pure red; people of color, and immigrants, and yes, Jews. ↩
Eventually. ↩
While tacitly allowing continued violence against Muslims, of course. ↩
“His campaign is really about campaign finance”, “he just said that stuff to get votes, of course he won’t do it”, “they’ll be better off in their home countries”, and a million other justifications. ↩

November 12, 2016 02:33 AM

November 11, 2016

Import Python

ImportPython Issue 98 - Introduction to airflow becoming pdb power user and more

Worthy Read

A Practical Introduction to Airflow

Airflow is a popular pipeline orchestration tool for Python that allows users to configure complex (or simple!) multi-system workflows that are executed in parallel across any number of workers. A single pipeline might contain bash, Python, and SQL operations. With dependencies specified between tasks, Airflow knows which ones it can run in parallel and which ones must run after others. Airflow is written in Python and users can add their own operators with custom functionality, doing anything Python can do.
video
, workflow engine

SendGrid

Email API from SendGrid. Reliably deliver your emails with a quick and simple API or SMTP integration. Try for Free. Curator's Note - Python and Django integration for sendgrid https://github.com/sendgrid/sendgrid-python and https://github.com/RyanBalfanz/django-sendgrid respectively. You can send 12,000 emails per month free.

Sponsor

Timing Tests in Python for Fun and Profit

I was preparing to push some changes a couple of days ago and as I usually do, I ran the tests. I sat back in my chair as the dots raced across the screen when suddenly I noticed that one of the dots linger. ”OS is probably running some updates in the background or something” I said to myself, and ran the tests again just to be sure. I watched closely as the dots filled the screen and there it was again?—?I have a slow test!

Become a pdb power-user

Good Tutorial on using pdb.
pdb

Tutorial proposals are due in three weeks

Talk proposals will be due on 2017 January 3.Poster proposals will be due on 2017 January 3.Tutorial proposals are due on 2017 November 30. Yes, that’s right — tutorial proposals are due in three weeks.
pycon

awesome-asyncio

A curated list of awesome Python asyncio frameworks, libraries, software and resources.
async-io
, curated list

Some thoughts on asynchronous API design in a post-async/await world

I've recently been exploring the exciting new world of asynchronous I/O libraries in Python 3 – specifically asyncio and curio. These two libraries make some different design choices. This is an essay that I wrote to try to explain to myself what those differences are and why I think they matter, and distill some principles for designing event loop APIs and asynchronous libraries in Python.
async-io

django-flat-responsive

An extension for Django admin that makes interface mobile friendly.
django

Great Dev - Meet Great Jobs

Try Hired and get in front of 4,000+ companies with one application. No more pushy recruiters, no more dead end applications and mismatched companies, Hired puts the power in your hands.

Sponsor

astpath

A command-line utility for querying Python ASTs using XPath syntax
opensource project

How to do distributed processing of Landsat data in Python ?

Cloud Dataflow provides a fully-managed, autoscaling, serverless execution environment for data pipelines written in Apache Beam. In this article Lak Lakshmanan and Matt Hancher show us how to create a monthly vegetation index from Landsat images, available as a public dataset.
google cloud

Jobs

Senior/Lead Software Engineer (Python, Django) at Salt

London, United Kingdom

We are working with a start-up who already has an established product, customer base and deliver cutting edge software solutions to some of the largest global media companies as well as some corporates. All the solutions are based around video and media so any experience in this area is a massive bonus.

Projects

nathan - 108 Stars, 18 Fork

Android Emulator for mobile security testing

byteNet-tensorflow - 92 Stars, 11 Fork

ByteNet for character-level language modelling

word_forms - 47 Stars, 1 Fork

Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

foss-heartbeat - 25 Stars, 4 Fork

FOSS Heartbeat analyses the health of a community of contributors. :heartbeat:

reprint - 10 Stars, 0 Fork

A simple module for Python 2/3 to print and refresh multi line output contents in terminal

pyjet - 8 Stars, 1 Fork

JET is a different approach to make numeric python substantially faster

Batch-Image-Downloader - 7 Stars, 5 Fork

A simple Batch Image Downloader using Python and BeautifulSoup.

ipynb - 6 Stars, 1 Fork

Package / Module importer for importing code from Jupyter Notebook files (.ipynb)

slackbridge - 2 Stars, 0 Fork

Bridge between IRC and Slack

November 11, 2016 01:04 PM

eGenix.com

eGenix pyOpenSSL Distribution 0.13.16 GA

Introduction

The eGenix.com pyOpenSSL Distribution includes everything you need to get started with SSL in Python. It comes with an easy to use installer that includes the most recent OpenSSL library versions in pre-compiled
form, making your application independent of OS provided OpenSSL libraries:

>>> eGenix pyOpenSSL Distribution Page

pyOpenSSL is an open-source Python add-on that allows writing SSL-aware networking applications as as certificate managment tools. It uses the OpenSSL library as performant and robust SSL engine.

OpenSSL is an open-source implementation of the SSL/TLS protocol.

News

This new release of the eGenix.com pyOpenSSL Distribution includes the following updates:

New in OpenSSL

Upgraded the included OpenSSL libraries to 1.0.2j.

The OpenSSL 1.0.2 branch will receive long term support (LTS), so is an ideal basis for development. See https://www.openssl.org/news/secadv/20160926.txt for a complete list of security fixes in 1.0.2j. The following fixes are relevant for pyOpenSSL applications:

CVE-2016-6304 A malicious client can send an excessively large OCSP Status Request extension leading to a DoS attack.
CVE-2016-6306 Some missing message length checks can result in OOB reads, which could be used for DoS attacks.

Updated the Mozilla CA root bundle to the current version as of 2016-11-10.

Please see the product changelog for the full set of changes.

pyOpenSSL / OpenSSL Binaries Included

In addition to providing sources, we make binaries available that include both pyOpenSSL and the necessary OpenSSL libraries for all supported platforms: Windows, Linux, Mac OS X and FreeBSD, for x86 and x64.

To simplify installation, we have uploaded a web installer to PyPI which will automatically choose the right binary for your platform, so a simple

pip install egenix-pyopenssl

will get you the package with OpenSSL libraries installed. Please see our installation instructions for details.

We have also added .egg-file distribution versions of our eGenix.com pyOpenSSL Distribution for Windows, Linux and Mac OS X to the available download options. These make setups using e.g. zc.buildout and other egg-file based installers a lot easier.

Downloads

Please visit the eGenix pyOpenSSL Distribution page for downloads, instructions on installation and documentation of the package.

Upgrading

Before installing this version of pyOpenSSL, please make sure that you uninstall any previously installed pyOpenSSL version. Otherwise, you could end up not using the included OpenSSL libs.

More Information

For more information on the eGenix pyOpenSSL Distribution, licensing and download instructions, please write to [email protected].

Enjoy !

Marc-Andre Lemburg, eGenix.com

November 11, 2016 09:00 AM

Wesley Chun

Using the Google Slides API with Python

NOTE: The code covered in this post are also available in a video walkthrough however the code here differs slightly, featuring some minor improvements to the code in the video.

Introduction

One of the critical things developers have not been able to do previously was access Google Slides presentations programmatically. To address this "shortfall," the Slides team pre-announced their first API a few months ago at Google I/O 2016—also see full announcement video (40+ mins). Today, the G Suite product team announced that the API has officially launched (generally available), finally giving all developers access to build or edit Slides presentations.

In this post, I'll walk through a simple example featuring an existing Slides presentation template with a single slide. On this slide are placeholders for a presentation name and company logo, as illustrated below:

One of the obvious use cases that will come to mind is to take a presentation template replete with "variables" and placeholders, and auto-generate decks from the same source but created with different data for different customers. For example, here's what a "completed" slide would look like after the proxies have been replaced with "real data:"

Using the Google Slides API

We need to edit/write into a Google Slides presentation, meaning the read-write scope from all Slides API scopes below:

'https://www.googleapis.com/auth/presentations' — Read-write access to Slides and Slides presentation properties
'https://www.googleapis.com/auth/presentations.readonly' — View-only access to Slides presentations and properties
'https://www.googleapis.com/auth/drive' — Full access to users' files on Google Drive

Why is the Google Drive API scope listed above? Well, think of it this way: APIs like the Google Sheets and Slides APIs were created to perform spreadsheet and presentation operations. However, importing/exporting, copying, and sharing are all file-based operations, thus where the Drive API fits in. If you need a review of its scopes, check out the Drive auth scopes page in the docs. Copying a file requires the full Drive API scope, hence why it's listed above. If you're not going to copy any files and only performing actions with the Slides API, you can of course leave it out.

Since we've fully covered the authorization boilerplate fully in earlier posts and videos, we're going to skip that here and jump right to the action.

Getting started

What are we doing in today's code sample? We start with a slide template file that has "variables" or placeholders for a title and an image. The application code will go then replace these proxies with the actual desired text and image, with the goal being that this scaffolding will allow you to automatically generate multiple slide decks but "tweaked" with "real" data that gets substituted into each slide deck.

The title slide template file is TMPFILE, and the image we're using as the company logo is the Google Slides product icon whose filename is stored as the IMG_FILE variable in my Google Drive. Be sure to use your own image and template files! These definitions plus the scopes to be used in this script are defined like this:

IMG_FILE = 'google-slides.png'     # use your own!
TMPLFILE = 'title slide template'  # use your own!
SCOPES = (
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/presentations',
)

Skipping past most of the OAuth2 boilerplate, let's move ahead to creating the API service endpoints. The Drive API name is (of course) 'drive', currently on 'v3', while the Slides API is 'slides' and 'v1' in the following call to create a signed HTTP client that's shared with a pair of calls to the apiclient.discovery.build() function to create the API service endpoints:

HTTP = creds.authorize(Http())
DRIVE =  discovery.build('drive',  'v3', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

Copy template file

The first step of the "real" app is to find and copy the template file TMPLFILE. To do this, we'll use DRIVE.files().list() to query for the file, then grab the first match found. Then we'll use DRIVE.files().copy() to copy the file and name it 'Google Slides API template DEMO':

rsp = DRIVE.files().list(q="name='%s'" % TMPLFILE).execute().get('files')[0]
DATA = {'name': 'Google Slides API template DEMO'}
print('** Copying template %r as %r' % (rsp['name'], DATA['name']))
DECK_ID = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute().get('id')

Find image placeholder

Next, we'll ask the Slides API to get the data on the first (and only) slide in the deck. Specifically, we want the dimensions of the image placeholder. Later on, we will use those properties when replacing it with the company logo, so that it will be automatically resized and centered into the same spot as the image placeholder.
The SLIDES.presentations().get() method is used to read the presentation metadata. Returned is a payload consisting of everything in the presentation, the masters, layouts, and of course, the slides themselves. We only care about the slides, so we get that from the payload. And since there's only one slide, we grab it at index 0. Once we have the slide, we're loop through all of the elements on that page and stop when we find the rectangle (image placeholder):

print('** Get slide objects, search for image placeholder')
slide = SLIDES.presentations().get(presentationId=DECK_ID
       ).execute().get('slides')[0]
obj = None
for obj in slide['pageElements']:
    if obj['shape']['shapeType'] == 'RECTANGLE':
        break

Find image file

At this point, the obj variable points to that rectangle. What are we going to replace it with? The company logo, which we now query for using the Drive API:

print('** Searching for icon file')
rsp = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute().get('files')[0]
print(' - Found image %r' % rsp['name'])
img_url = '%s&access_token=%s' % (
        DRIVE.files().get_media(fileId=rsp['id']).uri, creds.access_token)

The query code is similar to when we searched for the template file earlier. The trickiest thing about this snippet is that we need a full URL that points directly to the company logo. We use the DRIVE.files().get_media() method to create that request but don't execute it. Instead, we dig inside the request object itself and grab the file's URI and merge it with the current access token so what we're left with is a valid URL that the Slides API can use to read the image file and create it in the presentation.

Replace text and image

Back to the Slides API for the final steps: replace the title (text variable) with the desired text, add the company logo with the same size and transform as the image placeholder, and delete the image placeholder as it's no longer needed:

print('** Replacing placeholder text and icon')
reqs = [
    {'replaceAllText': {
        'containsText': {'text': '{{NAME}}'},
        'replaceText': 'Hello World!'
    }},
    {'createImage': {
        'url': img_url,
        'elementProperties': {
            'pageObjectId': slide['objectId'],
            'size': obj['size'],
            'transform': obj['transform'],
        }
    }},
    {'deleteObject': {'objectId': obj['objectId']}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=DECK_ID).execute()
print('DONE')

Once all the requests have been created, send them to the Slides API then let the user know everything is done.

Conclusion

That's the entire script, just under 60 lines of code. If you watched the video, you may notice a few minor differences in the code. One is use of the fields parameter in the Slides API calls. They represent the use of field masks, which is a separate topic on its own. As you're learning the API now, it may cause unnecessary confusion, so it's okay to disregard them for now. The other difference is an improvement in the replaceAllText request—the old way in the video is now deprecated, so go with what we've replaced it with in this post.

If your template slide deck and image is in your Google Drive, and you've modified the filenames and run the script, you should get output that looks something like this:

$ python3 slides_template.py
** Copying template 'title slide template' as 'Google Slides API template DEMO'
** Get slide objects, search for image placeholder
** Searching for icon file
 - Found image 'google-slides.png'
** Replacing placeholder text and icon
DONE

Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). If I were to divide the script into major sections, they would be:

Get creds & build API service endpoints
Copy template file
Get image placeholder size & transform (for replacement image later)
Get secure URL for company logo
Build and send Slides API requests to...

Replace slide title variable with "Hello World!"
Create image with secure URL using placeholder size & transform
Delete image placeholder

Here's the complete script—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:

from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

IMG_FILE = 'google-slides.png'      # use your own!
TMPLFILE = 'title slide template'   # use your own!
SCOPES = (
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/presentations',
)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
HTTP = creds.authorize(Http())
DRIVE  = discovery.build('drive',  'v3', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

rsp = DRIVE.files().list(q="name='%s'" % TMPLFILE).execute().get('files')[0]
DATA = {'name': 'Google Slides API template DEMO'}
print('** Copying template %r as %r' % (rsp['name'], DATA['name']))
DECK_ID = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute().get('id')

print('** Get slide objects, search for image placeholder')
slide = SLIDES.presentations().get(presentationId=DECK_ID,
        fields='slides').execute().get('slides')[0]
obj = None
for obj in slide['pageElements']:
    if obj['shape']['shapeType'] == 'RECTANGLE':
        break

print('** Searching for icon file')
rsp = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute().get('files')[0]
print(' - Found image %r' % rsp['name'])
img_url = '%s&access_token=%s' % (
        DRIVE.files().get_media(fileId=rsp['id']).uri, creds.access_token)

print('** Replacing placeholder text and icon')
reqs = [
    {'replaceAllText': {
        'containsText': {'text': '{{NAME}}'},
        'replaceText': 'Hello World!'
    }},
    {'createImage': {
        'url': img_url,
        'elementProperties': {
            'pageObjectId': slide['objectId'],
            'size': obj['size'],
            'transform': obj['transform'],
        }
    }},
    {'deleteObject': {'objectId': obj['objectId']}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=DECK_ID).execute()
print('DONE')

As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

EXTRA CREDIT: 1) EASIER: Add more text variables and replace them too. 2) HARDER: Change the image-based image placeholder to a text-based image placeholder, say a textbox with the text, "{{COMPANY_LOGO}}" and use the replaceAllShapesWithImage request to perform the image replacement. By making this one change, your code should be simplified from the image-based image replacement solution we used in this post.

November 11, 2016 12:41 AM

November 10, 2016

Gaël Varoquaux

Data science instrumenting social media for advertising is responsible for todays politics

To my friends developing data science for the social media, marketing, and advertising industries,

It is time to accept that we have our share of responsibility in the outcome of the US elections and the vote on Brexit. We are not creating the society that we would like. Facebook, Twitter, targeted advertising, customer profiling, are harmful to truth and have helped Brexiting and electing Trump. Journalism has been replaced by social media and commercial content tailored to influence the reader: your own personal distorted reality.

There are many deep reasons why Trump won the election. Here, as a data scientist, I want to talk about the factors created by data science.

Rumor replaces truth: the way we, data-miners, aggregate and recommend content is based on its popularity, on readership statistics. In no way is it based in the truthfulness of the content. As a result, Facebook, Twitter, Medium, and the like amplify rumors and sensational news, with no reality check [1].

This is nothing new: clickbait and tabloids build upon it. However, social networking and active recommendation makes things significantly worst. Indeed, birds of a feather flock together, reinforcing their own biases. We receive filtered information: have you noticed that every single argument you heard was overwhelmingly against (or in favor of) Brexit? To make matters even worse, our brain loves it: to resolve cognitive dissonance we avoid information that contradicts our biases [2].

Note

We all believe more information when it confirms our biases

Gossiping, rumors, and propaganda have always made sane decisions difficult. The filter bubble, algorithmically-tuned rose-colored glasses of Facebook, escalate this problem into a major dysfunction of our society. They amplify messy and false information better than anything before. Soviet-style propaganda builds on a carefully-crafted lies; post-truth politics build on a flood of information that does not even pretend to be credible in the long run.

Active distortion of reality: amplifying biases to the point that they drown truth is bad. Social networks actually do worse: they give tools for active manipulation of our perception of the world. Indeed, the revenue of today’s Internet information engines comes from advertising. For this purpose they are designed to learn as much as possible about the reader. Then they sell this information bundled with a slot where the buyer can insert the optimal message to influence the reader.

The Trump campaign used targeted Facebook ads presenting to unenthusiastic democrats information about Clinton tuned to discourage them from voting. For instance, portraying her as racist to black voters.

Information manipulation works. The Trump campaign has been a smearing campaign aimed at suppressing votes of his opponent. Release of negative information on Clinton did affect her supporter allegiance.

Tech created the perfect mind-control tool, with an eyes on sales revenue. Someone used it for politics.

The tech industry is mostly socially-liberal and highly educated, wishing the best for society. But it must accept its share of the blame. My friends improving machine-learning for costumer profiling and add placement, you help shaping a world of lies and deception. I will not blame you for accepting this money: if it were not for you, others would do it. But we should all be thinking about how do we improve this system. How do we use data science to build a world based on objectivity, transparency, and truth, rather than Internet-based marketing?

References analysing the erosion of truth

Must-read article in the economist on lies in politics
Wikipedia page on Post-truth politics
Donald Trump won because of Facebook
The real story behind todays referendum : Neil Lawrence’s analysis of the filter-bublle effect in Brexit
A 2013 academic study showing that twitter increases partisan polarization

Disgression: other social issues of data science

The tech industry is increasing inequalities, making the rich richer and leaving the poor behind. Data-science, with its ability to automate actions and wield large sources of information, is a major contributor to these sources of inequalities.
Internet-based marketing is building a huge spying machine that infers as much as possible about the user. The Trump campaign was able to target a specific population, black voters leaning towards democrats. What if this data was used for direct executive action? This could come quicker than we think, given how intelligence agencies tap into social media.

I preferred to focus this post on how data-science can help distort truth. Indeed, it is a problem too often ignored by data scientists who like to think that they are empowering users.

In memory of Aaron Schwartz who fought centralized power on Internet.

[1]	Facebook was until recently using human curators, but fired them, leading to a loss of control on veracity

[2]	It is a well-known and well-studied cognitive bias that individuals strive to reduce cognitive dissonace and actively avoid situations and information likely to increase it

November 10, 2016 11:00 PM

Brett Cannon

Why I took October off from OSS volunteering

November 10, 2016 07:55 PM

What to look for in a new TV

I'm kind of an A/V nerd. Now I'm not hardcore enough to have a vinyl collection or have an amp for my TV, but all my headphones cost over $100 and I have a Sonos Playbar so I don't have to put up with crappy TV speakers. What I'm trying to say is that I care about the A/V equipment I use, but not to the extent that money is no object when it comes to my enjoyment of a movie (I'm not that rich and my wife would kill me if I spent that kind of money on electronics). That means I tend to research extensively before making a major A/V purchase since I don't do it very often and I want quality within reason which does not lend itself to impulse buying.

Prior to September 1, 2016, I had a 2011 Vizio television. It was 47", did 1080p, and had passive 3D. When I purchased the TV I was fresh out of UBC having just finished my Ph.D. so it wasn't top-of-the-line, but it was considered very good for the price. I was happy with the picture, but admittedly it wasn't amazing; the screen had almost a matte finish which led to horrible glare. I also rarely used the 3D in the television as 3D Blu-Ray discs always cost extra and so few movies took the time to actually film in 3D to begin with, instead choosing to do it in post-production (basically animated films and TRON: Legacy were all that we ever watched in 3D). And to top it all off, the TV took a while to turn on. I don't know what kind of LCB bulbs were in it, but they took forever to warm up and just annoyed me (yes, very much a first-world problem).

So when UHD came into existence I started to keep an eye on the technology and what television manufacturers were doing to incorporate the technology to entice people like me to upgrade. After two years of watching this space and one of the TVs I was considering having a one-day sale that knock 23% off the price, I ended up buying a 55" Samsung KS8000 yesterday. Since I spent so much time considering this purchase I figured I would try and distill what knowledge I have picked up over the years into a blog post so that when you decide to upgrade to UHD you don't have to start from zero knowledge like I did.

What to care about

First, you don't care about the resolution of the TV. All UHD televisions are 4K, so that's just taken care of for you. It also doesn't generally make a difference in the picture because most people sit too far away from their TV to make the higher resolution matter.

No, the one thing you're going to care about is HDR and everything that comes with it. And of course it can't be a simple thing to measure like size or resolution. Oh no, HDR has a bunch of parts to it that go into the quality of the picture: brightness, colour gamut, and format (yes, there's a format war; HD-DVD/Blu-Ray didn't teach the TV manufacturers a big enough lesson).

Brightness

A key part of HDR is the range of brightness to show what you frequently hear referred to as "inky blacks" and "bright whites". The way you get deep blacks and bright whites is by supporting a huge range of brightness. What you will hear about TVs is what their maximum nit is. Basically you're aiming for 1000 nits or higher for a maximum and as close to 0 as possible for a minimum.

Now of course this isn't as simple as it sounds as there's different technology being used to try and solve this problem.

LCD

Thanks to our computers I'm sure everyone reading this is familiar with LCD displays. But what you might not realize is how they exactly work. In a nutshell there are LED lightbulbs behind your screen that provides white light, and then the LCD pixels turn on and off the red/green/blue parts of themselves to filter out certain colours. So yeah, there are lightbulbs in your screen and how strong they are dictates how bright your TV screen will be.

Now the thing that comes into play here for brightness is how those LED bulbs are oriented in order to get towards that 0 nits for inky blacks. Typical screens are edge-list, which means there is basically a strip of LEDs on the edges of the TV that shine light towards the middle of the screen. This is fine and it's what screens have been working with for a while, but it does mean there's always some light behind the pixels so it's kind of hard to keep it from leaking out a little bit.

This is where local dimming comes in. Some manufacturers are now laying out the LED bulbs in an array/grid behind the screen instead of at the edges. What this allows is for the TV to switch dim an LED bulb if it isn't needed at full strength to illuminate a certain quadrant of the screen (potentially even switching off entirely). Obviously the denser the array, the more local dimming zones and thus the greater chance a picture with some black in it will be able to switch off an LED to truly get a dark black for that part of the screen. As for how often something you're watching is going to allow you to take advantage of such local dimming due to a dark area lining up within a zone is going to vary so this is going to be a personal call as to whether this makes a difference to you.

OLED

If I didn't have a budget and wanted the ultimate solution for getting the best blacks in a picture, I would probably have an OLED TV from LG. What makes these TVs so great is the fact that OLEDs are essentially pixels that provide their own light. What that means is if you want an OLED pixel to be black, you simply switch it off. Or to compare it to local dimming, it's as if every pixel was its own local dimming zone. So if you want truly dark blacks, OLED are the way to go. It also leads to better colours since the intensity of the pixel is consistent compared to an LCD where the brightness is affected by how far the pixel is from the LED bulb that's providing light to the pixel.

But the drawback is that OLED TVs only get so bright. Since each pixel has to generate its own light they can't reach really four-digit nit levels like the LCD TVs can. It's still much brighter than any HD TV, but OLED TVs don't match the maximum brightness of the higher-end LCD TVs.

So currently it's a race to see if LCDs can get their blacks down or if OLEDs can get their brightness up. But from what I have read, in 2016 your best bet is OLED if you can justify the cost to yourself (they are very expensive televisions).

Colour gamut

While having inky blacks and bright whites are nice, not everyone is waiting for Mad Max: Fury Road in black and white. That means you actually care about the rest of the rainbow, which means you care about the colour gamut of the TV for a specific colour space. TVs are currently trying to cover as much of the DCI-P3 colour space as possible right now. Maybe in a few years TVs will fully cover that colour space, at which point they will start worrying about Rec. 2020 (also called BT.2020), but there's still room in covering DCI-P3 before that's something to care about.

In the end colour gamut is probably not going to be something you explicitly shop for, but more of something to be aware of that you will possibly gain by going up in price on your television.

Formats

So you have your brightness and you have your colours, now you have to care about what format all of this information is stored in. Yes my friends, there's a new format war and it's HDR10 versus Dolby Vision. Now if you buy a TV from Vizio or LG then you don't have to care because they are supporting both formats. But if you consider any other manufacturer you need to decide on whether you care about Dolby Vision because everyone supports HDR10 these days but no one supports Dolby Vision at the moment except those two manufacturers.

There is one key reason that HRD10 is supported by all television makers: it's an open specification. By being free it doesn't cut into profits of TVs which obviously every manufacturer likes and is probably why HDR10 is the required HDR standard for Ultra Blu-Ray discs (Dolby Vision is supported on Ultra Blu-Ray, but not required). Dolby Vision, on the other hand, requires licensing fees paid to Dolby. Articles also consistently suggest that Dolby Vision requires new hardware which would also drive up costs of supporting Dolby Vision (best I can come up with is that since Dolby Vision is 12-bit and HDR10 is 10-bit that TVs typically use a separate chip for Dolby Vision processing).

Dolby Vision does currently have two things going for it over HDR10. One is that Dolby Vision is dynamic per frame while HDR10 is static. This is most likely a temporary perk, though, because HDR10 is gaining dynamic support sometime in the future.

Two is that Dolby Vision is part of an end-to-end solution from image capture to projection in the theatres. By making Dolby Vision then also work at home it allows for directors and editors to get the results they want for the cinema and then just pass those results along to your TV without extra work.

All of this is to say that Dolby Vision seems to be the better technology, but the overhead/cost of adding it to a TV along with demand will ultimately dictate whether it catches on. Luckily all TV manufacturers has agreed on the minimum standard of HDR10 so you won't be completely left out if you buy a TV from someone other than LG or Vizio.

Where to go for advice

When it comes time to buy a TV, I recommend Rtings.com for advice. They have a very nice battery of tests they put the TV through and give you nice level of detail on how they reached their scores for each test. They even provide the settings they used for their tests so you can replicate them at home.

You can also read what the Wirecutter is currently recommending. For me, though, I prefer Rtings.com and use the Wirecutter as a confirmation check if their latest TV round-up isn't too out-of-date.

Ultra HD Premium

If you want a very simple way to help choose a television, you can simply consider ones that are listed as Ultra HD Premium. That way you know the TV roughly meets a minimum set of specifications that are reasonable to want if you're spending a lot of money on a TV. The certification is new in 2016 and so there are not a ton of TVs yet that have the certification, but since TV manufacturers like having stamps on their televisions I suspect it will start to become a thing.

One thing to be aware of is that Vizio doesn't like the certification. Basically they have complained that the lack of standards around how to actually measure what the certification requires makes it somewhat of a moot point. That's a totally reasonable criticism and why using the certification as a filter for TVs consider is good, but to not blindly buy a TV just because it has Ultra HD Premium stamp of approval.

Why I chose my TV

Much like when I bought a soundbar, I had some restrictions placed upon me when considering what television I wanted. One, the TV couldn't be any larger than 55" (to prevent the TV from taking over the living room even though we should have a 65" based on the minimum distance people might sit from the TV). This immediately put certain limits on me as some model lines don't start until 65" like the Vizio Reference series. I also wasn't willing to spend CAD 4,000 on an LG, so that eliminated OLED from consideration. I also wanted HDR, so that eliminated an OLED that was only HD.

In the end it was between the 55" Samsung KS8000, 55" Vizio P-series, and the 50" Vizio P-series. The reason for the same Vizio model at different sizes is the fact that they use different display technology; the 50" has a VA display while the 55" has an IPS display. The former will have better colours but the latter has better viewing angles. Unfortunately I couldn't find either model on display here in Vancouver to see what kind of difference it made.

One other knock against the Vizio -- at least at 55" -- was that it wasn't very good in a bright room. That's a problem for us as our living room is north facing with a big window and the TV is perpendicular to those windows, so we have plenty of glare on the screen as the sun goes down. The Samsung, on the other hand, was rated to do better in a glare-heavy room. And thanks to a one-day sale it brought the price of the Samsung to within striking distance of the Vizio. So in the end with the price difference no longer a factor I decided to go with the TV that worked best with glare and maximized the size I could go with.

My only worry with my purchase is if Dolby Vision ends up taking hold and I get left in the cold somehow. But thanks to the HDR10 support being what Ultra Blu-Ray mandates I'm not terribly worried of being shut out entirely from HDR content. There's also hope that I might be able to upgrade my television in the future thanks to it using a Mini One Connect which breaks out the connections from the television. In other TVs the box is much bigger as it contains all of the smarts of the television, allowing future upgrades. There's a chance I will be able to upgrade the box to get Dolby Vision in the future, but that's just a guess at this point that it's even possible, let alone whether Samsung choose to add Do

It's been 48 hours with the TV and both Andrea and I are happy with the purchase; me because the picture is great, Andrea because I will now shut up about television technology in regards to a new TV purchase.

November 10, 2016 06:24 PM

Introducing Which Film

What I'm announcing

Today I'm happy to announce the public unveiling of Which Film! I'll discuss how the site came about and what drives it, but I thought I would first explain what it does: it's a website to help you choose what movie you and your family/friends should watch together. What you do is you go to the site, enter in the Trakt.tv usernames of everyone who wants to watch a film together (so you need at least two people and kept data like your watchlist and ratings on Trakt), and then Which Film cross-references everyone's watchlists and ratings to create a list of movies that people may want to watch together.

The list of movies is ranked based on a simple point scale. If a movie is on someone's watchlist it gets 4 points, movies rated 10 ⭐ get 3 points, 9 ⭐ get 2 points, and 8 ⭐ get 1 point. Everyone who participates contributes points and the movies are sorted from highest score to lowest. The reason for the point values is the assumption that watching a movie most people have not seen is the best, followed by a movies people rate very highly. In the case of ties, the movie seen longest ago (if ever) by anyone in the group is ranked higher than movies seen more recently by someone. That way there's a bigger chance someone will be willing to watch a movie again when everyone else wants to see it for the first time.

None of this is very fancy or revolutionary, but it's useful any time you get together with a group of friends to watch a film and you end up having a hard time choosing to watch. It can help even between spouses as it will identify movies both people want to watch, removing that particular point of contention.

The story behind Which Film

Now normally launching a new website wouldn't cause for any backstory, but this project has been under development for about six years, so there's a bit of history to it.

One fateful night ...

The inspiration for Which Film stemmed from one night when my co-creator Karl, his wife, my wife, and I got together and decided we wanted to watch a movie. This turned out to be quite an ordeal due to disparate tastes among all four of us. Karl and I thought that there had to be a better way to figure out a film we could all happily watch together. It didn't need to necessarily be something none of us had seen (although that was preferred), but it did need to be something that had a chance of making all of us happy if we chose to watch it.

This is when I realized that at least for me I had all of the relevant data to make such a decision on IMDb. I had been keeping my watchlist and ratings up-to-date on the site for years, to the point of amassing a watchlist over of 400 movies. Karl and I realized that had all four of us done that we could have cross-referenced the data and easily have found a film we all liked. Yes, it would require convincing everyone involved to keep track of what movies they wanted to see and rating movies that had seen, but we figured that wasn't an insurmountable problem. And so we decided we should code up a solution since we're both software developers.

You need an API, IMDb

But there was trouble with this project from the beginning. It turns out that while IMDb is happy for you to store your data on their servers, they don't exactly make it easy to get the data out. For instance, when I started looking into this they had two ways of getting to your data in some programmatic way: RSS and CSV files. The problem with RSS is that it was capped at (I believe) 200 entries, so I couldn't use it to access my entire data set. The issue with CSV was that you had to be logged in to download it. And the issue with both approaches was they were constantly broken for for different things simultaneously; when I looked into this last RSS was busted for one kind of list while CSV was broken for another. To top it all off the brokenness wasn't temporary, but lasted for lengths of time measured in months. That obviously doesn't work if you want to rely on the data and there's no official API (and IMDb at least used to aggressively go after anyone who use their name in a project).

Luckily I found Trakt. It has an API, it was accessible on a phone, and it wasn't ugly. The trick, though, was getting my data from IMDb to Trakt. Luckily there was a magical point when CSV exporting on IMDb worked for all of my lists, and so I downloaded the data and hacked together csv2trakt to migrate my data over (there is TraktRater for importing into Trakt as well, but at the time I had issues getting it to run on macOS).

What platform?

With my data moved over, we then had to choose what platform to have Which Film on. We toyed with the idea of doing a mobile app, but I'm an Android user and Karl is on iOS (and the same split for our wives), so that would have meant two apps. That didn't really appeal to either of us so we decided to do a website. We also consciously chose to do a single-page app to avoid maintaining a backend where would have to worry about uptime, potential server costs, etc. It also helps that there's a local company in Vancouver called Surge that does really nice static page hosting with a very reasonable free tier (when they get Let's Encrypt support I'll probably bump up to their paid tier if people actually end up using Which Film).

Choosing a programming language is never easy for me

Since we had found a website we were willing to ask people to use to store data, I had solved my data import problem, and we had decided on doing a website solution, next was what technology stack to use. The simple answer would have been Python, but for me that's somewhat boring since I obviously know Python. To make sure we both maximized our learning from this project we endeavoured to find a programming language neither of us had extensive experience in.

Eventually we settled on Dart. At the time we made this decision I worked at Google which is where Dart comes from, so I knew if I got really stuck with something I had internal resources to lean on. Karl liked the idea of using Dart because his game developer background appreciated the fact that Dart was looking into things like SIMD for performance. I also knew that Dart had been chosen by the ads product division at Google which meant it wasn't going anywhere. That also meant choosing Angular 2 was a simple decision since Google was using Dart with Angular 2 for products and so it would have solid Dart support.

But why six years?!?

As I have said, the site isn't complicated as you can tell from its source code, so you may be wondering why it took us six years before we could finish it. Well, since coming up with this idea I at least finished my Ph.D., moved five times between two countries,and worked for two different employers (if you don't count my Ph.D.). Karl had a similar busy life over the same timespan. And having me spend a majority of those six years in a different timezone didn't help facilitate discussions. At least we had plenty of time to think through various UX and design problems. ☺

If you give Which Film a try do let Karl and/or me know on Twitter (if you just want to see how the website works and you don't have a Trakt account you can use our usernames: brettcannon and kschmidt).

November 10, 2016 06:24 PM

Caktus Consulting Group

Common web site security vulnerabilities

I recently decided I wanted to understand better what Cross-Site Scripting and Cross-Site Request Forgery were, and how they compared to that classic vulnerability, SQL Injection.

I also looked into some ways that sites protect against those attacks.

Vulnerabilities

SQL Injection

SQL Injection is a classic vulnerability. It probably dates back almost to punch cards.

Suppose a program uses data from a user in a database query.

For example, the company web site lets users enter a name of an employee, free-form, and the site will search for that employee and display their contact information.

A naive site might build a SQL query as a string using code like this, including whatever the user entered as NAME:

"SELECT * FROM employees WHERE name LIKE '" + NAME + "'"

If NAME is "John Doe", then we get:

SELECT * FROM employees WHERE name LIKE 'John Doe'

which is fine. But suppose someone types this into the NAME field:

John Doe'; DROP TABLE EMPLOYEES;

then the site will end up building this query:

SELECT * FROM employees WHERE name LIKE 'John Doe'; DROP TABLE EMPLOYEES;'

which might delete the whole employee directory. It could instead do something less obvious but even more destructive in the long run.

This is called a SQL Injection attack, because the attacker is able to inject whatever they want into a SQL command that the site then executes.

Cross Site Scripting

Cross Site Scripting, or XSS, is a similar idea. If an attacker can get their Javascript code embedded into a page on the site, so that it runs whenever someone visits that page, then the attacker's code can do anything on that site using the privileges of the user.

For example, maybe an attacker posts a comment on a page that looks to users like:

Great post!

but what they really put in their comment was:

Great post!<script> do some nefarious Javascript stuff </script>

If the site displays comments by just embedding the text of the comment in the page, then whenever a user views the page, the browser will run the Javascript - it has no way to know this particular Javascript on the page was written by an attacker rather than the people running the site.

This Javascript is running in a page that was served by the site, so it can do pretty much anything the user who is currently logged in can do. It can fetch all their data and send it somewhere else, or if the user is particularly privileged, do something more destructive, or create a new user with similar privileges and send its credentials somewhere the bad guy can retrieve them and use them later, even after the vulnerability has been discovered and fixed.

So, clearly, a site that accepts data uploaded by users, stores it, and then displays it, needs to be careful of what's in that data.

But even a site that doesn't store any user data can be vulnerable. Suppose a site lets users search by going to http://example.com/search?q=somethingtosearchfor (Google does something similar to this), and then displays a page showing what the search string was and what the results were. An attacker can embed Javascript into the search term part of that link, put that link somewhere people might click on it, and maybe label it "Cute Kitten Pictures". When a user clicks the link to see the kittens, her browser visits the site and tries the search. It'll probably fail, but if the site embeds the search term in the results page unchanged (which Google doesn't do), the attacker's code will run.

Why is it called Cross-Site Scripting? Because it allows an attacker to run their script on a site they don't control.

CSRF

Cross Site Request Forgeries

The essence of a CSRF attack is a malicious site making a request to another site, the site under attack, using the current user's permissions.

That last XSS example could also be considered a CSRF attack.

As another, extreme example, suppose a site implemented account deletion by having a logged-in user visit (GET) /delete-my-account. Then all a malicious site would have to do is link to yoursite.com/delete-my-account and if a user who was logged into yoursite.com clicked the link, they'd make the /delete-my-account request and their account would be gone.

In a more sophisticated attack, a malicious site can build a form or make AJAX calls that do a POST or other request to the site under attack when a user visits the malicious site.

Protecting against vulnerabilities

Protections in the server and application

SQL Injection protection

Django's ORM, and most database interfaces I've seen, provide a way to specify parameters to queries directly, rather than having the programmer build the whole query as a string. Then the database API can do whatever is appropriate to protect against malicious content in the parameters.

XSS protection

Django templates apply "escaping" to all embedded content by default. This marks characters that ordinarily would be special to the browser, like "<", so that the browser will just display the "<" instead of interpreting it. That means if content includes "<SCRIPT>...</SCRIPT>", instead of the browser executing the "..." part, the user will just see "<SCRIPT>...</SCRIPT>" on the page.

CSRF protection

We obviously can't disable links to other sites - that would break the entire web. So to protect against CSRF, we have to make sure that another site cannot build any request to our site that would actually do anything harmful.

The first level of protection is simply making sure that request methods like GET don't change anything, or display unvalidated data. That blocks the simplest possible attack, where a simple link from another site causes harm when followed.

A malicious site can still easily build a form or make AJAX calls that do a POST or other request to the site under attack, so how do we protect against that?

Django's protection is to always include a user-specific, unguessable string as part of such requests, and reject any such request that doesn't include it. This string is called the CSRF token. Any form on a Django site that does a POST etc has to include it as one of the submitted parameters. Since the malicious site doesn't know the token, it cannot generate a malicious POST request that the Django site will pay any attention to.

Protections in the browser

Modern browsers implement a number of protections against these kinds of attacks.

"But wait", I hear you say. "How can I trust browsers to protect my application, when I have no control over the browser being used?"

I frequently have to remind myself that browser protections are designed to protect the user sitting in front of the browser, who for these attacks, is the victim, not the attacker. The user doesn't want their account hacked on your site any more than you do, and these browser protections help keep the attacker from doing that to the user, and incidentally to your site.

Same-origin security policy

All modern browsers implement a form of Same Origin Policy, which I'll call SOP. In some cases, it prevents a page loaded from one site from accessing resources on other sites, that is, resources that don't have the same origin.

The most important thing about SOP is that AJAX calls are restricted by default. Since an AJAX call can use POST and other data-modifying HTTP requests, and would send along the user's cookies for the target site, an AJAX call could do anything it wanted using the user's permissions on the target site. So browsers don't allow it.

What kind of attack does this prevent? Suppose the attacker sets up a site with lots of cute kitten pictures, and gets a user victim to access it. Without SOP, pages on that site could run Javascript that made AJAX calls (in the background) to the user's bank. Such calls would send along whatever cookies the user's browser had stored for the bank site, so the bank would treat them as coming from the user. But with SOP, the user's browser won't let those AJAX calls to another site happen. They can only talk to the attacker's own site, which doesn't do the attacker any good.

CSP

Content Security Policy (CSP)

CSP is a newer mechanism that browsers can use to better protect from these kinds of attacks.

If a response includes the CSP header, then by default the browser will not allow any inline javascript, CSS, or use of javascript "eval" on the page. This blocks many forms of XSS. Even if an attacker manages to trick the server into including malicious code on the page, the browser will refuse to execute it.

For example, if someone uploads a comment that includes a <script> tag with some Javascript, and the site includes that in the page, the browser just won't run the Javascript.

Conclusion

I've barely touched the surface on these topics here. Any web developer ought to have at least a general knowledge of common vulnerabilities, if only to know what areas might require more research on a given project.

A reasonable place to start is Django's Security Overview.

The OWASP Top Ten is a list of ten of the most commonly exploited vulnerabilities, with links to more information about each. The ones I've described here are numbers 1, 3, and 8 on the list, so you can see there are many more to be aware of.

November 10, 2016 05:47 PM

Semaphore Community

Dockerizing a Python Django Web Application

This article is brought with ❤ to you by Semaphore.

Introduction

This article will cover building a simple 'Hello World'-style web application written in Django and running it in the much talked about and discussed Docker. Docker takes all the great aspects of a traditional virtual machine, e.g. a self contained system isolated from your development machine, and removes many of the drawbacks such as system resource drain, setup time, and maintenance.

When building web applications, you have probably reached a point where you want to run your application in a fashion that is closer to your production environment. Docker allows you to set up your application runtime in such a way that it runs in exactly the same manner as it will in production, on the same operating system, with the same environment variables, and any other configuration and setup you require.

By the end of the article you'll be able to:

Understand what Docker is and how it is used,
Build a simple Python Django application, and
Create a simple Dockerfile to build a container running a Django web application server.

What is Docker, Anyway?

Docker's homepage describes Docker as follows:

"Docker is an open platform for building, shipping and running distributed applications. It gives programmers, development teams, and operations engineers the common toolbox they need to take advantage of the distributed and networked nature of modern applications."

Put simply, Docker gives you the ability to run your applications within a controlled environment, known as a container, built according to the instructions you define. A container leverages your machines resources much like a traditional virtual machine (VM). However, containers differ greatly from traditional virtual machines in terms of system resources. Traditional virtual machines operate using Hypervisors, which manage the virtualization of the underlying hardware to the VM. This means they are large in terms of system requirements.

Containers operate on a shared Linux operating system base and add simple instructions on top to execute and run your application or process. The difference being that Docker doesn't require the often time-consuming process of installing an entire OS to a virtual machine such as VirtualBox or VMWare. Once Docker is installed, you create a container with a few commands and then execute your applications on it via the Dockerfile. Docker manages the majority of the operating system virtualization for you, so you can get on with writing applications and shipping them as you require in the container you have built. Furthermore, Dockerfiles can be shared for others to build containers and extend the instructions within them by basing their container image on top of an existing one. The containers are also highly portable and will run in the same manner regardless of the host OS they are executed on. Portability is a massive plus side of Docker.

Prerequisites

Before you begin this tutorial, ensure the following is installed to your system:

Python 2.7 or 3.x,
Docker (Mac users: it's recommended to use docker-machine, available via Homebrew-Cask), and
A git repository to store your project and track changes.

Setting Up a Django web application

Starting a Django application is easy, as the Django dependency provides you with a command line tool for starting a project and generating some of the files and directory structure for you. To start, create a new folder that will house the Django application and move into that directory.

$ mkdir project
$ cd project

Once in this folder, you need to add the standard Python project dependencies file which is usually named requirements.txt, and add the Django and Gunicorn dependency to it. Gunicorn is a production standard web server, which will be used later in the article. Once you have created and added the dependencies, the file should look like this:

$ cat requirements.txt                                                              
Django==1.9.4
gunicorn==19.6.0

With the Django dependency added, you can then install Django using the following command:

$ pip install -r requirements.txt

Once installed, you will find that you now have access to the django-admin command line tool, which you can use to generate the project files and directory structure needed for the simple "Hello, World!" application.

$ django-admin startproject helloworld

Let's take a look at the project structure the tool has just created for you:

.
├── helloworld
│   ├── helloworld
│   │   ├── __init__.py
│   │   ├── settings.py
│   │   ├── urls.py
│   │   └── wsgi.py
│   └── manage.py
└── requirements.txt

You can read more about the structure of Django on the official website. django-admin tool has created a skeleton application. You control the application for development purposes using the manage.py file, which allows you to start the development test web server for example:

$ cd helloworld
$ python manage.py runserver

The other key file of note is the urls.py, which specifies what URL's route to which view. Right now, you will only have the default admin URL which we won't be using in this tutorial. Lets add a URL that will route to a view returning the classic phrase "Hello, World!".

First, create a new file called views.py in the same directory as urls.py with the following content:

from django.http import HttpResponse

def index(request):
    return HttpResponse("Hello, world!")

Now, add the following URL url(r'', 'helloworld.views.index') to the urls.py, which will route the base URL of / to our new view. The contents of the urls.py file should now look as follows:

from django.conf.urls import url
from django.contrib import admin

urlpatterns = [
    url(r'^admin/', admin.site.urls),
    url(r'', 'helloworld.views.index'),
]

Now, when you execute the python manage.py runserver command and visit http://localhost:8000 in your browser, you should see the newly added "Hello, World!" view.

The final part of our project setup is making use of the Gunicorn web server. This web server is robust and built to handle production levels of traffic, whereas the included development server of Django is more for testing purposes on your local machine only. Once you have dockerized the application, you will want to start up the server using Gunicorn. This is much simpler if you write a small startup script for Docker to execute. With that in mind, let's add a start.sh bash script to the root of the project, that will start our application using Gunicorn.

#!/bin/bash

# Start Gunicorn processes
echo Starting Gunicorn.
exec gunicorn helloworld.wsgi:application \
    --bind 0.0.0.0:8000 \
    --workers 3

The first part of the script writes "Starting Gunicorn" to the command line to show us that it is starting execution. The next part of the script actually launches Gunicorn. You use exec here so that the execution of the command takes over the shell script, meaning that when the Gunicorn process ends so will the script, which is what we want here.

You then pass the gunicorn command with the first argument of helloworld.wsgi:application. This is a reference to the wsgi file Django generated for us and is a Web Server Gateway Interface file which is the Python standard for web applications and servers. Without delving too much into WSGI, the file simply defines the application variable, and Gunicorn knows how to interact with the object to start the web server.

You then pass two flags to the command, bind to attach the running server to port 8000, which you will use to communicate with the running web server via HTTP. Finally, you specify workers which are the number of threads that will handle the requests coming into your application. Gunicorn recommends this value to be set at (2 x $num_cores) + 1. You can read more on configuration of Gunicorn in their documentation.

Finally, make the script executable, and then test if it works by changing directory into the project folder helloworld and executing the script as shown here. If everything is working fine, you should see similar output to the one below, be able to visit http://localhost:8000 in your browser, and get the "Hello, World!" response.

$ chmod +x start.sh
$ cd helloworld
$ ../start.sh
Starting Gunicorn.
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Starting gunicorn 19.6.0
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Listening at: http://0.0.0.0:8000 (82248)
[2016-06-26 19:43:28 +0100] [82248] [INFO]
Using worker: sync
[2016-06-26 19:43:28 +0100] [82251] [INFO]
Booting worker with pid: 82251
[2016-06-26 19:43:28 +0100] [82252] [INFO]
Booting worker with pid: 82252
[2016-06-26 19:43:29 +0100] [82253] [INFO]
Booting worker with pid: 82253

Dockerizing the Application

You now have a simple web application that is ready to be deployed. So far, you have been using the built-in development web server that Django ships with the web framework it provides. It's time to set up the project to run the application in Docker using a more robust web server that is built to handle production levels of traffic.

Installing Docker

One of the key goals of Docker is portability, and as such is able to be installed on a wide variety of operating systems.

For this tutorial, you will look at installing Docker Machine on MacOS. The simplest way to achieve this is via the Homebrew package manager. Instal Homebrew and run the following:

$ brew update && brew upgrade --all && brew cleanup && brew prune
$ brew install docker-machine

With Docker Machine installed, you can use it to create some virtual machines and run Docker clients. You can run docker-machine from your command line to see what options you have available. You'll notice that the general idea of docker-machine is to give you tools to create and manage Docker clients. This means you can easily spin up a virtual machine and use that to run whatever Docker containers you want or need on it.

You will now create a virtual machine based on VirtualBox that will be used to execute your Dockerfile, which you will create shortly. The machine you create here should try to mimic the machine you intend to run your application on in production. This way, you should not see any differences or quirks in your running application neither locally nor in a deployed environment.

Create your Docker Machine using the following command:

$ docker-machine create development --driver virtualbox
--virtualbox-disk-size "5000" --virtualbox-cpu-count 2
--virtualbox-memory "4096"

This will create your machine and output useful information on completion. The machine will be created with 5GB hard disk, 2 CPU's and 4GB of RAM.

To complete the setup, you need to add some environment variables to your terminal session to allow the Docker command to connect the machine you have just created. Handily, docker-machine provides a simple way to generate the environment variables and add them to your session:

$ docker-machine env development
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://123.456.78.910:1112"
export DOCKER_CERT_PATH="/Users/me/.docker/machine/machines/development"
export DOCKER_MACHINE_NAME="development"
# Run this command to configure your shell:
# eval "$(docker-machine env development)"

Complete the setup by executing the command at the end of the output:

$(docker-machine env development)

Execute the following command to ensure everything is working as expected.

$ docker images
REPOSITORY   TAG   IMAGE  ID   CREATED   SIZE

You can now dockerize your Python application and get it running using the docker-machine.

Writing the Dockerfile

The next stage is to add a Dockerfile to your project. This will allow Docker to build the image it will execute on the Docker Machine you just created. Writing a Dockerfile is rather straightforward and has many elements that can be reused and/or found on the web. Docker provides a lot of the functions that you will require to build your image. If you need to do something more custom on your project, Dockerfiles are flexible enough for you to do so.

The structure of a Dockerfile can be considered a series of instructions on how to build your container/image. For example, the vast majority of Dockerfiles will begin by referencing a base image provided by Docker. Typically, this will be a plain vanilla image of the latest Ubuntu release or other Linux OS of choice. From there, you can set up directory structures, environment variables, download dependencies, and many other standard system tasks before finally executing the process which will run your web application.

Start the Dockerfile by creating an empty file named Dockerfile in the root of your project. Then, add the first line to the Dockerfile that instructs which base image to build upon. You can create your own base image and use that for your containers, which can be beneficial in a department with many teams wanting to deploy their applications in the same way.

# Dockerfile

# FROM directive instructing base image to build upon
FROM python:2-onbuild

It's worth noting that we are using a base image that has been created specifically to handle Python 2.X applications and a set of instructions that will run automatically before the rest of your Dockerfile. This base image will copy your project to /usr/src/app, copy your requirements.txt and execute pip install against it. With these tasks taken care of for you, your Dockerfile can then prepare to actually run your application.

Next, you can copy the start.sh script written earlier to a path that will be available to you in the container to be executed later in the Dockerfile to start your server.

# COPY startup script into known file location in container
COPY start.sh /start.sh

Your server will run on port 8000. Therefore, your container must be set up to allow access to this port so that you can communicate to your running server over HTTP. To do this, use the EXPOSE directive to make the port available:

# EXPOSE port 8000 to allow communication to/from server
EXPOSE 8000

The final part of your Dockerfile is to execute the start script added earlier, which will leave your web server running on port 8000 waiting to take requests over HTTP. You can execute this script using the CMD directive.

# CMD specifcies the command to execute to start the server running.
CMD ["/start.sh"]
# done!

With all this in place, your final Dockerfile should look something like this:

# Dockerfile

# FROM directive instructing base image to build upon
FROM python:2-onbuild

# COPY startup script into known file location in container
COPY start.sh /start.sh

# EXPOSE port 8000 to allow communication to/from server
EXPOSE 8000

# CMD specifcies the command to execute to start the server running.
CMD ["/start.sh"]
# done!

You are now ready to build the container image, and then run it to see it all working together.

Building and Running the Container

Building the container is very straight forward once you have Docker and Docker Machine on your system. The following command will look for your Dockerfile and download all the necessary layers required to get your container image running. Afterwards, it will run the instructions in the Dockerfile and leave you with a container that is ready to start.

To build your container, you will use the docker build command and provide a tag or a name for the container, so you can reference it later when you want to run it. The final part of the command tells Docker which directory to build from.

$ cd <project root directory>
$ docker build -t davidsale/dockerizing-python-django-app .

Sending build context to Docker daemon 237.6 kB
Step 1 : FROM python:2-onbuild
# Executing 3 build triggers...
Step 1 : COPY requirements.txt /usr/src/app/
 ---> Using cache
Step 1 : RUN pip install --no-cache-dir -r requirements.txt
 ---> Using cache
Step 1 : COPY . /usr/src/app
 ---> 68be8680cbc4
Removing intermediate container 75ed646abcb6
Step 2 : COPY start.sh /start.sh
 ---> 9ef8e82c8897
Removing intermediate container fa73f966fcad
Step 3 : EXPOSE 8000
 ---> Running in 14c752364595
 ---> 967396108654
Removing intermediate container 14c752364595
Step 4 : WORKDIR helloworld
 ---> Running in 09aabb677b40
 ---> 5d714ceea5af
Removing intermediate container 09aabb677b40
Step 5 : CMD /start.sh
 ---> Running in 7f73e5127cbe
 ---> 420a16e0260f
Removing intermediate container 7f73e5127cbe
Successfully built 420a16e0260f

In the output, you can see Docker processing each one of your commands before outputting that the build of the container is complete. It will give you a unique ID for the container, which can also be used in commands alongside the tag.

The final step is to run the container you have just built using Docker:

$ docker run -it -p 8000:8000 davidsale/djangoapp1
Starting Gunicorn.
[2016-06-26 19:24:11 +0000] [1] [INFO]
Starting gunicorn 19.6.0
[2016-06-26 19:24:11 +0000] [1] [INFO]
Listening at: http://0.0.0.0:9077 (1)
[2016-06-26 19:24:11 +0000] [1] [INFO]
Using worker: sync
[2016-06-26 19:24:11 +0000] [11] [INFO]
Booting worker with pid: 11
[2016-06-26 19:24:11 +0000] [12] [INFO]
Booting worker with pid: 12
[2016-06-26 19:24:11 +0000] [17] [INFO]
Booting worker with pid: 17

The command tells Docker to run the container and forward the exposed port 8000 to port 8000 on your local machine. After you run this command, you should be able to visit http://localhost:8000 in your browser to see the "Hello, World!" response. If you were running on a Linux machine, that would be the case. However, if running on MacOS, then you will need to forward the ports from VirtualBox, which is the driver we use in this tutorial so that they are accessible on your host machine.

$ VBoxManage controlvm "development" natpf1
  "tcp-port8000,tcp,,8000,,8000";

This command modifies the configuration of the virtual machine created using docker-machine earlier to forward port 8000 to your host machine. You can run this command multiple times changing the values for any other ports you require.

Once you have done this, visit http://localhost:8000 in your browser. You should be able to visit your dockerized Python Django application running on a Gunicorn web server, ready to take thousands of requests a second and ready to be deployed on virtually any OS on planet using Docker.

Next Steps

After manually verifying that the appication is behaving as expected in Docker, the next step is the deployment. You can use Semaphore's Docker platform for automating this process.

Conclusion

In this tutorial, you have learned how to build a simple Python Django web application, wrap it in a production grade web server, and created a Docker container to execute your web server process.

If you enjoyed working through this article, feel free to share it and if you have any questions or comments leave them in the section below. We will do our best to answer them, or point you in the right direction.

This article is brought with ❤ to you by Semaphore.

November 10, 2016 01:54 PM

Mocks and Monkeypatching in Python

This article is brought with ❤ to you by Semaphore.

Post originally published on http://krzysztofzuraw.com/. Republished with author's permission.

Introduction

In this post I will look into the essential part of testing — mocks.

First of all, what I want to accomplish here is to give you basic examples of how to mock data using two tools — mock and pytest monkeypatch.

Why bother mocking?

Some of the parts of our application may have dependencies for other libraries or objects. To isolate the behaviour of our parts, we need to substitute external dependencies. Here comes the mocking. We mock an external API to check certain behaviours, such as proper return values, that we previously defined.

Mocking function

Let’s say we have a module called function.py:

def square(value):
    return value ** 2

def cube(value): 
    return value ** 3

def main(value): 
    return square(value) + cube(value)

Then let’s see how these functions are mocked using the mock library:

    try:
        import mock
    except ImportError:
        from unittest import mock

    import unittest

    from function import square, main


    class TestNotMockedFunction(unittest.TestCase):

        @mock.patch('__main__.square', return_value=1)
        def test_function(self, mocked_square):
            # because you need to patch in exact place where function that has to be mocked is called
            self.assertEquals(square(5), 1)

        @mock.patch('function.square')
        @mock.patch('function.cube')
        def test_main_function(self, mocked_square, mocked_cube):
            # underling function are mocks so calling main(5) will return mock
            mocked_square.return_value = 1
            mocked_cube.return_value = 0
            self.assertEquals(main(5), 1)
            mocked_square.assert_called_once_with(5)
            mocked_cube.assert_called_once_with(5)


    if __name__ == '__main__':
        unittest.main()

What is happening here? Lines 1-4 are for making this code compatible between Python 2 and 3. In Python 3, mock is part of the standard library, whereas in Python 2 you need to install it by pip install mock.

In line 13, I patched the square function. You have to remember to patch it in the same place you use it. For instance, I’m calling square(5) in the test itself so I need to patch it in __main__. This is the case if I’m running this by using python tests/test_function.py. If I’m using pytest for that, I need to patch it as test_function.square.

In lines 18-19, I patch the square and cube functions in their module because they are used in the main function. The last two asserts come from the mock library, and are there to make sure that mock was called with proper values.

The same can be accomplished using mokeypatching for py.test:

from function import square, main

def test_function(monkeypatch):
    monkeypatch.setattr(“test_function_pytest.square”, lambda x: 1)
    assert square(5) == 1

def test_main_function(monkeypatch): 
    monkeypatch.setattr(‘function.square’, lambda x: 1) 
    monkeypatch.setattr(‘function.cube’, lambda x: 0) 
    assert main(5) == 1

As you can see, I’m using monkeypatch.setattr for setting up a return value for given functions. I still need to monkeypatch it in proper places — test_function_pytest and function.

Mocking classes

I have a module called square:

import math

class Square(object): 
    def __init__(radius): 
        self.radius = radius

        def calculate_area(self):
            return math.sqrt(self.radius) * math.pi

and mocks using standard lib:

try: 
    import mock 
except ImportError: 
    from unittest import mock

import unittest

from square import Square

class TestClass(unittest.TestCase):

       @mock.patch('__main__.Square') # depends in witch from is run
       def test_mocking_instance(self, mocked_instance):
           mocked_instance = mocked_instance.return_value
           mocked_instance.calculate_area.return_value = 1
           sq = Square(100)
           self.assertEquals(sq.calculate_area(), 1)


       def test_mocking_classes(self):
           sq = Square
           sq.calculate_area = mock.MagicMock(return_value=1)
           self.assertEquals(sq.calculate_area(), 1)

       @mock.patch.object(Square, 'calculate_area')
       def test_mocking_class_methods(self, mocked_method):
           mocked_method.return_value = 20
           self.assertEquals(Square.calculate_area(), 20)

if __name__ == ‘__main__’:
    unittest.main()

At line 13, I patch the class Square. Lines 15 and 16 present a mocking instance. mocked_instance is a mock object which returns another mock by default, and to these mock.calculate_area I add return_value 1. In line 23, I’m using MagicMock, which is a normal mock class, except in that it also retrieves magic methods from the given object. Lastly, I use patch.object to mock the method in the Square class.

The same using pytest:

try: 
    from mock import MagicMock 
except ImportError: 
    from unittest.mock import MagicMock

from square import Square

def test_mocking_class_methods(monkeypatch):
    monkeypatch.setattr('test_class_pytest.Square.calculate_area', lambda: 1)
    assert Square.calculate_area() ==  1


def test_mocking_classes(monkeypatch):
    monkeypatch.setattr('test_class_pytest.Square', MagicMock(Square))
    sq = Square
    sq.calculate_area.return_value = 1
    assert sq.calculate_area() ==  1

The issue here is with test_mocking_class_methods, which works well in Python 3, but not in Python 2.

All examples can be found in this repo.

If you have any questions and comments, feel free to leave them in the section below.

References:

This article is brought with ❤ to you by Semaphore.

November 10, 2016 01:35 PM

Codementor

Building a Chatbot using Telegram and Python (Part 1)

Chatbots are all the rage at the moment, with some predicting that they will be bigger than mobile apps. The main idea of chatbots is that instead of having to dig through awkward mobile menus and learn UIs, you’ll simply have a conversation with a bot through a familiar
instant messaging interface. If you want to order a Pizza, you start a conversation with the Domino’s Pizza bot and have the same conversation with it that you might have with a human.

There are a few different platforms that allow you to build your own chatbot. One of these, which is arguably the simplest to use and is also growing steadily in popularity, is Telegram.

In this tutorial, we’ll walk through building a simple Telegram Bot using Python. At first, our bot will simply echo back any message we send it, but then we’ll extend it to add a database and persist information across chat sessions.

We’ll use Python to power our Bot and SQLite to store information persistently across sessions. In summary, this is a tutorial series that will:

Show you how to write a simple Echo Bot from scratch using Python and the Telegram Bot API (Part 1)
Extend the Echo Bot into a ToDo list manager bot, backed by a SQLite database (Part 2)
Show how to run our Bot from a VPS and allow it to scale to more users (Part 3).

Although creating an Echo Bot is simple enough, and you can find various scripts and frameworks online that will give you this as a starting point—we will do everything from scratch and explain every piece of code we write. We’ll also look at some subtleties in the Telegram API, and talk about what these mean for us as developers. If you just want to create a Telegram bot as quickly as possible, this tutorial is probably not what you’re looking for, but if you want to gain a deeper understanding of how chatbots work and how to build one from scratch, then you’re in the right place.

What you need

You’ll need to have:

Some basic Python knowledge to follow this tutorial
You should be comfortable with running commands in a Linux Shell, a MacOS Terminal, or a Windows Command Prompt
You should be able to install Python packages using the pip package manager (or conda if you’re more comfortable with that)
Ideally, you should have written at least a basic SQL statement before, but this is not strictly necessary (and will only be relevant in Part 2).

All of the code is aimed at Python 3.5, but it should be easily adaptable to other versions of Python.

Why Python?

You can write a Telegram chat bot in any language you want. Some of the main options apart from Python would be Java, PHP, or Ruby. If you are more familiar with a different high-level programming language, then you might prefer to use that instead, but Python is a good choice for several reasons:

Python can make HTTP requests very concisely and simply through the requests module. Getting the content from a URL (which is how we’ll be controlling our Telegram Bot) would need many more lines of Java than the Python equivalent.
Python is the most popular language for natural language processing and machine learning: although we won’t be using either of these for our simple bot, both of them would be necessary for a more advanced Bot. Thus, if you want to extend the Bot, it’s good to get comfortable with Python.
Python has good support for serving web content: when we want to scale up our Bot to allow it to receive many messages per second, Python has mature technologies such as WSGI to reach “web scale”.
Python is portable—we can easily run the same code on Linux, MacOS, or Windows.

Setting up

Nearly everything we do will be achievable using only the standard Python libraries, but we’ll also be using the third-party requests module which provides a number of improvements to Python’s urllib, and allows us to make HTTP requests very simply and concisely. Install this through pip using a command similar to the following (you may need to use pip instead of pip3 and/or add the --user flag, based on how you usually install Python libraries).

pip3 install requests

If you usually use a virtual environment for new Python projects, then set one of those up first, and install requests inside that.

Creating a Telegram Bot

The first step is to tell Telegram that you want to create a new bot. All the messages that our Bot sends and receives will go through Telegram’s infrastructure. Our code will periodically make a request to retrieve all new messages to our Bot from Telegram’s servers, and will then send responses to each message as necessary. In order to register a bot with Telegram, you first need to create a personal Telegram account. Visit web.telegram.org and enter your phone number. Telegram will send you a text message (SMS), and you can then create an account by following the instructions on the screen. If you already have a Telegram account, then you can simply use that one, and you can also use any of the Telegram Desktop and Mobile apps available from telegram.org, instead of the Web app that we’ll be using for all examples in this tutorial.

Once you have a Telegram account, you can register a new Telegram Bot by using Bot Father. Visit telegram.me/botfather to start a conversation with Telegram’s bot that creates other bots. Telegram bots can receive messages or commands. The former are simply text that you send as if you were sending a message to another person, while the latter are prefixed with a / character. To create a new bot, send the following command to Bot Father as a chat (exactly as if you were talking to another person on Telegram).

/newbot

You should get a reply instantly that asks you to choose a name for your Bot. We’ll call our Bot To Do Bot because, by the end of this tutorial, it’ll function as a simple “to do” list. Send the following message to Bot Father when it prompts you for a name:

To Do Bot

Bot Father will now ask you to pick a username for your Bot. This username has to end in bot, and be globally unique. As Telegram has grown more popular, it has become more difficult to find a short and relevant username for your Bot. In this tutorial, we’ll be using exampletodo_bot, but for the rest of this tutorial, we’ll indicate the Bot’s username with <your-bot-username>, so you’ll have to substitute your chosen username wherever relevant from now on. Send your chosen username to Bot Father:

<your-bot-username>

Now Bot Father will send you a “Congratulations” message, which will include a token. The token should look something like this:

2483457814:AAHrlCx234_VskzWEJdWjTsdfuwejHyu5mI

For the rest of this tutorial, we’ll indicate where you need to put your token by using <your-bot-token>.

Take note of the token, as we’ll need it in the code that we’re about to write.

Interacting with our Bot through our web browser

We can control our Bot by sending HTTPS requests to Telegram. This means that the simplest way to interact with our Bot is through a web browser. By visiting different URLs, we send different commands to our Bot. The simplest command is one where we get information about our Bot. Visit the following URL in your browser (substituting the bot token that you got before)

https://api.telegram.org/bot<your-bot-token>/getme

The first part of the URL indicates that we want to communicate with the Telegram API (api.telegram.org). We follow this with /bot to say that we want to send a command to our Bot, and immediately after we add our token to identify which bot we want to send the command to and to prove that we own it. Finally, we specify the command that we want to send (/getme) which in this case just returns basic information about our Bot using JSON. The response should look similar to the following:

{"ok":true,"result":{"id":248718785,"first_name":"To Do Bot","username":"exampletodo_bot"}}

Retrieving messages sent to our Bot

The simplest way for us to retrieve messages sent to our Bot is through the getUpdates call. If you visit https://api.telegram.org/bot<your-bot-token>/getUpdates, you’ll get a JSON response of all the new messages sent to your Bot. Our Bot is brand new and probably hasn’t received any messages yet, so if you visit this now, you should see an empty response.

Telegram Bots can’t talk to users until the user first initiates a conversation (this is to reduce spam). In order to try out the getUpdates call, we’ll first send a message to our Bot from our own Telegram account. Visit telegram.me/<your-bot-username> to open a conversation with your Bot in the web client (or search for @<your-bot-username> in any of the Telegram clients). You should see your Bot displayed with a /start button at the bottom of the screen. Click this button to start chatting with your Bot. Send your Bot a short message, such as “hello”.

Now visit the https://api.telegram.org/bot<your-bot-token>/getUpdates URL again, and you should see a JSON response showing the messages that your bot has received (including one from when you pressed the start button). Let’s take a look at an example of this and highlight the import data that we’ll be writing code to extract in the next section.

{"ok":true,"result":[{"update_id":625407400,
"message":{"message_id":1,"from":{"id":24860000,"first_name":"Gareth","last_name":"Dwyer (sixhobbits)","username":"sixhobbits"},"chat":{"id":24860000,"first_name":"Gareth","last_name":"Dwyer (sixhobbits)","username":"sixhobbits","type":"private"},"date":1478087433,"text":"\/start","entities":[{"type":"bot_command","offset":0,"length":6}]}},{"update_id":625407401,
"message":{"message_id":2,"from":{"id":24860000,"first_name":"Gareth","last_name":"Dwyer (sixhobbits)","username":"sixhobbits"},"chat":{"id":24860000,"first_name":"Gareth","last_name":"Dwyer (sixhobbits)","username":"sixhobbits","type":"private"},"date":1478087624,"text":"test"}}]}

The result section of the JSON is a list of updates that we haven’t acknowledged yet (we’ll talk about how to acknowledge updates later). In this example, our Bot has two new messages. Each message contains a bunch of data about who sent it, what chat it is part of, and the contents of the message. The two pieces of information that we’ll focus on for now are the chat ID, which will allow us to send a reply message and the message text which contains the text of the message. In the next section, we’ll see how to extract these two pieces of data using Python.

Sending a message from our Bot

The final API call that we’ll try out in our browser is that used to send a message. To do this, we need the chat ID for the chat where we want to send the message. There are a bunch of different IDs in the JSON response from the getUpdates call, so make sure you get the right one. It’s the id field which is inside the chat field (24860000 in the example above, but yours will be different). Once you have this ID, visit the following URL in your browser, substituting <chat-id> for your chat ID.

https://api.telegram.org/bot248718785:AAHrlCoWA4I_VskzJEWdWjVXywVUjHyu5mI/sendMessage?chat_id=<chat-id>&text=TestReply

Once you’ve visited this URL, you should see a message from your Bot sent to your which says “TestReply”.

Now that we know how to send and receive messages using the Telegram API, we can get going with automating this process by writing some logic in Python.

Writing the Python code for our Bot

Now we can get to writing Python. Create the file echobot.py and add the following code:

import json 
import requests

TOKEN = "<your-bot-token>"
URL = "https://api.telegram.org/bot{}/".format(TOKEN)


def get_url(url):
    response = requests.get(url)
    content = response.content.decode("utf8")
    return content


def get_json_from_url(url):
    content = get_url(url)
    js = json.loads(content)
    return js


def get_updates():
    url = URL + "getUpdates"
    js = get_json_from_url(url)
    return js


def get_last_chat_id_and_text(updates):
    num_updates = len(updates["result"])
    last_update = num_updates - 1
    text = updates["result"][last_update]["message"]["text"]
    chat_id = updates["result"][last_update]["message"]["chat"]["id"]
    return (text, chat_id)


def send_message(text, chat_id):
    url = URL + "sendMessage?text={}&chat_id={}".format(text, chat_id)
    get_url(url)
    

text, chat = get_last_chat_id_and_text(get_updates())
send_message(text, chat)

Let’s pull apart what this code does:

In lines 1 and 2, we import the requests and json modules. The first is to make web requests using Python and we’ll use it to interact with the Telegram API (similarly to what we were using our web browser for earlier). We’ll use the JSON module to parse the JSON responses from Telegram into Python dictionaries so that we can extract the pieces of data that we need.
The next two lines are global variables, where we define our Bot’s token that we need to authenticate with the Telegram API, and we create the basic URL that we’ll be using in all our requests to the API.
The get_url function simply downloads the content from a URL and gives us a string. We add the .decode("utf8") part for extra compatibility as this is necessary for some Python versions on some platforms. Normally, we’d do some exception handling here as this request could fail if our internet connection were down, if Telegram’s service were down, or if there were an issue with our Token. However for simplicity, here we’ll simply assume that everything Always Works (TM).
The get_json_from_url function gets the string response as above and parses this into a Python dictionary using json.loads() (loads is short for Load String). We’ll always use this one as Telegram will always give us a JSON response.
get_updates calls the same API command that we used in our browser earlier, and retrieves a list of “updates” (messages sent to our Bot).
get_last_chat_id_and_text provides a simple but inelegant way to get the chat ID and the message text of the most recent message sent to our Bot. Because get_updates will always send all the messages that were recently sent to our bot, this is not ideal, as we will always download a whole bunch of messages when we only want the last one. We’ll discuss later in more detail how to do this more elegantly. For now, this function returns a tuple of the chat_id which identifies the specific chat between our Bot and the person who sent the message, and the text, which is the message itself.
send_message takes the text of the message we want to send (text) and the chat ID of the chat where we want to send the message (chat_id). It then calls the sendMessage API command, passing both the text and the chat ID as URL parameters, thus asking Telegram to send the message to that chat.

The final two lines bring everything we have written together to actually receive and send a message. First, we get the text and the chat ID from the most recent message sent to our Bot. Then, we call send_message using the same text that we just received, effectively “echoing” the last message back to the user.

At the moment, our script doesn’t listen for new messages and immediately reply. Instead, when we run it, our bot will fetch only the most recent message sent to that and echo it. We can test it out by sending our bot a message, and then running the script. Give this a go!

Flaws with our bot

The most obvious problem with our Bot is that we have to run a Python script manually every time we want to interact with it. Also, as mentioned before, we always download the entire message history that Telegram provides. This is both inefficient and unreliable, as we don’t want to unnecessarily download the entire message history if we only want a single message, and because Telegram only keeps this list of updates for 24 hours. Another issue is that we pass our message as a string, but because this is converted to a URL before being sent to Telegram, you’ll notice that some unexpected things happen if you send messages to the bot with special characters (for example, the + symbol will disappear from all echoed messages). Finally, the Bot throws an index error if we try to run it when there are no new messages to receive.

We’ll now update our bot to:
* Constantly listen for new messages and reply to each.
* Acknowledge each message as it receives it and tells Telegram to not send us that message again.
* Use Long Polling so that we don’t have to make too many requests.
* Correctly encode our messages to account for URL formatting.

Listening for new messages

We don’t want to manually start our Bot every time that we want it to reply to the latest message, so the first thing to do is to wrap our code that receives new messages and echoes them back in a loop. We’ll also put this in a main function and use the Pythonic if __name__ == '__main__' statement so that we could import our functions into another script without running anything. We don’t want to ask for new updates as fast as possible, so we’ll also put a small delay between requests (this is kinder to Telegram’s servers and better for our own network resources, too).

At the top of the file, add a new import for the Python time module

import time

And change the last two lines of the file to read as follows:

def main():
    last_textchat = (None, None)
    while True:
        text, chat = get_last_chat_id_and_text(get_updates())
        if (text, chat) != last_textchat:
            send_message(text, chat)
            last_textchat = (text, chat)
        time.sleep(0.5)


if __name__ == '__main__':
    main()

This code now gets the most recent messages from Telegram every half second. We now also need to remember the most recent message that we replied to (we save this in the last_textchat variable) so that we don’t keep on sending the echoes every second to messages that we’ve already processed. This is again a very crude way of achieving what we want (for example, if we send the same message to our bot twice in a row, it won’t reply to the second one), but we’ll see a more elegant way to achieve this below. For now, you can run this code and now instead of the script terminating, you’ll see that it keeps running. You can now send your Bot a series of messages, and (as long as you don’t send more than one per half second), you’ll see each of them getting echoed back again.

Acknowledging the messages we’ve already seen

Instead of asking Telegram for all our recent messages with every call, and then trying to figure out which ones we are interested in, we can tell Telegram that we’ve already processed certain messages and that we want to stop receiving them as part of the getUpdates calls. Each update has an update_id field, and these are incremental (later messages have higher numbers). When we make the getUpdates API call, we can optionally pass an offset argument and give an update_id as the value. This tells Telegram that we’ve already seen and processed that message and that we don’t want it again. This also means that Telegram will never send us any of the previous messages (messages with a lower update_id) again either, so we need to make sure that we really are finished with all of the messages before doing this.

Modify our bot as follows:

Add an optional offset parameter to our getUpdates function. If this is specified, we’ll pass it along to the Telegram API to indicate that we don’t want to receive any messages with smaller IDs this. The modified function should look like this:

def get_updates(offset=None):
    url = URL + "getUpdates"
    if offset:
        url += "?offset={}".format(offset)
    js = get_json_from_url(url)
    return js

Add a function that calculates the highest ID of all the updates we receive from getUpdates. This should look as follows.

def get_last_update_id(updates):
    update_ids = []
    for update in updates["result"]:
        update_ids.append(int(update["update_id"]))
    return max(update_ids)

This simply loops through each of the updates that we get from Telegram and then returns the biggest ID. We need this so that we can call getUpdates again, passing this ID, and indicate which messages we’ve already seen.

Add a function to d an echo reply for each message that we receive. This should look as follows:

def echo_all(updates):
    for update in updates["result"]:
        text = update["message"]["text"]
        chat = update["message"]["chat"]["id"]
        send_message(text, chat)

Update the code in main() so that it looks like this:

def main():
    last_update_id = None
    while True:
        updates = get_updates(last_update_id)
        if len(updates["result"]) > 0:
            last_update_id = get_last_update_id(updates) + 1
            echo_all(updates)
        time.sleep(0.5)

Our main code no longer needs to worry about duplicate messages, as each time we get new messages, we send the biggest update_id along with the next request, ensuring that we only ever receive messages that we haven’t seen before.

Note that we have to check if there are new updates (which we do in the third line of main()), and that we have to always send an update ID which is one bigger than the previous one we’ve seen (i.e. we’re actually telling Telegram which ID we’re expecting, not which one we’ve seen).

Try out the changes by restarting the Python script and sending some messages to your Bot—you should see that it works as before, but now it doesn’t matter if you send duplicate messages or send messages too quickly, both of which are big improvements.

Using Long Polling

The last major problem with our Echo Bot is that it has to make a web request every 0.5 seconds. This is not great for Telegram’s servers (they explicitly ask people not to do this outside of testing scenarios) and not great for our resources either. Long Polling takes advantage of the fact that most of the time, we are receiving “empty” responses. Because our Bot is probably not going to be receiving messages every half second, most of the time when we ask for updates, there aren’t any. With Long Polling, instead of Telegram telling us that there aren’t updates, it simply keeps the connection open until there are updates, and then sends these down the open pipe. Of course, it’s impractical to keep a connection open forever, so we can specify the number of seconds that we want to wait for. This is done by passing another optional argument to the getUpdates call, namely timeout.

To make our code use Long Polling, simply update our get_updates method as follows:

def get_updates(offset=None):
    url = URL + "getUpdates?timeout=100"
    if offset:
        url += "&offset={}".format(offset)
    js = get_json_from_url(url)
    return js

Now we always pass along the timeout argument. Because we now have two arguments, we also need to change where we previously had ?offset={} to &offset={} (in URLs, we specify that the argument list is starting with a ? but further arguments are separated with &).

Run the bot again, and it should run exactly as before, but now it’ll be making far fewer requests and using less of your machine’s resources. If you want to check that this is working, simply add a line like print("getting updates") directly below the while True in the main function and run the bot with and without the timeout argument that we just added. Without the timeout, you’ll see that the bot checks for updates every 0.5 seconds. While with the timeout, it will only initiate a new check every 100 seconds, or whenever a new message is received.

Correctly encoding our message text

The final problem of our echo bot is that it acts strangely if we send it messages containing special characters. For example, all + signs disappear from our messages, and all text after an & sign disappears, too. This is caused by these symbols having special meanings in the context of URLs. To fix this, we need to encode any special characters in our message. Luckily, the standard Python urllib has a function that handles this for us, so we only need to import that and add a single line of code.

Add the following line at the top of your .py file

import urllib

And now modify the send_message function to read as follows:

def send_message(text, chat_id):
    text = urllib.parse.quote_plus(text)
    url = URL + "sendMessage?text={}&chat_id={}".format(text, chat_id)
    get_url(url)

Restart the Bot once more, and send it some messages that were problematic before, such as:

+
Hello+
Hello&test

Now it should be able to reply to all of these messages (and pretty much anything else you throw at it, including emoji) flawlessly.

End of Part 1

That brings us to the end of the first part of this tutorial. We built a simple Echo Bot using the Telegram Bot API from scratch and implemented some more advanced features such as keeping track of which messages we’d already processed, using Long Polling, and correctly encoding our messages for URLs. In the next part, we’ll add a database and turn our Bot into something more useful—a To Do List.

The final code listing for the Echo Bot presented here can be found at https://github.com/sixhobbits/python-telegram-tutorial.

More to come: Watch out for Parts 2 and 3 of this 3-part tutorial.

November 10, 2016 11:03 AM

Brian Okken

PythonBytes.fm

Michael Kennedy from Talk Python to Me and I have launched a new podcast, called Python Bytes, “Python headlines delivered directly to your earbuds”. It’s a weekly short format podcast. Please check it out. The first few weeks of a podcast can really make a difference if we can get a bunch of listeners to […]

The post PythonBytes.fm appeared first on Python Testing.

November 10, 2016 08:49 AM

24: pytest with Raphael Pierzina

pytest pytest is an extremely popular test framework used by many projects and companies. In this episode, I interview Raphael Pierzina, a core contributor to both pytest and cookiecutter. We discuss how Raphael got involved with both projects, his involvement in cookiecutter, pytest, “adopt pytest month”, the pytest code sprint, and of course some of […]

The post 24: pytest with Raphael Pierzina appeared first on Python Testing.

November 10, 2016 08:46 AM

Daniel Bader

How code linting will make you awesome at Python

In Python code reviews I’ve seen over and over that it can be tough for developers to format their Python code in a consistent way: extra whitespace, irregular indentation, and other “sloppiness” then often leads to actual bugs in the program.

Luckily automated tools can help with this common problem. Code linters make sure your Python code is always formatted consistently – and their benefits go way beyond that.

What code linters can do for you

A code linter is a program that analyses your source code for potential errors. The kinds of errors a linter can detect include:

syntax errors;
structural problems like the use of undefined variables;
best practice or code style guideline violations.

I find code linting to be an indispensable productivity tool for writing Python. It’s possible to integrate linting into your editing environment. This gives you immediate feedback on your code right when you type it:

For some classes of errors, linting can shorten the usual write code, run code, catch error, fix error loop to write code, see and fix error. This difference might not seem much – but in the course of a day these time savings add up quickly and can have a huge impact on your productivity.

In short, code linters are great!

Which Python linter should I use?

Python has several good options for code linters. The ones I’m listing here are available for free and are open-source software:

Flake8 is my personal favorite these days. It’s fast and has a low rate of false positives. Flake8 is actually a combination of several other tools, mainly the Pyflakes static analysis tool and the Pycodestyle (former pep8) code style checker.
Pylint is another good choice. It takes a little more effort to set up than Flake8 and also triggers more false positives. On the other hand it provides a more comprehensive analysis. Definitely not a bad choice – but I’d stick with Flake8 if you’re just starting out.

I’m sold – what’s the quickest way to get started?

If you’re not using a linter yet you’re missing out on some really great benefits. But don’t worry, I’ve got your back – I recorded a 5 minute Python linting video tutorial you can watch below.

In the video I’ll give you the run down on how to set up the Flake8 Python linter from scratch. With a few simple steps you’ll be able run a code linter on your own Python programs. I’ll also demonstrate how linter feedback can be integrated with your code editor (I’m using Sublime Text 3 in the video).

I’ve seen great results from using linters. I believe they’re one of the quickest ways to improve your Python skills. Spend 5 minutes to try out Flake8 – I’m sure it’ll be well worth your time 😊

Enjoy the video:

November 10, 2016 12:00 AM

November 09, 2016

PyCon

Tutorial proposals are due in three weeks

The PyCon 2017 call for proposals (CFP) first opened about a month ago, and the team who will be bringing the conference to Portland have been excited to watch the first wave of submissions roll in. Exciting topics from across the PyCon community have already been proposed for our talks, tutorials, and poster schedules.

But we know that many of you are brimming with ideas that you have not yet submitted, so we wanted to remind you of this year’s deadlines:

Talk proposals will be due on 2017 January 3.
Poster proposals will be due on 2017 January 3.
Tutorial proposals are due on 2017 November 30.

Yes, that’s right — tutorial proposals are due in three weeks!

Last year we explained the one-month difference between the talk and tutorial deadlines in a detailed blog post that we invite you to review this year if you want to understand why the Tutorial review process takes more time for its committee. Entrusted with the one PyCon schedule for which attendees pay an individual fee per course, the Tutorial Committee takes extra time to make sure that courses are going to live up to the conference’s high reputation. As the Tutorials Chair, Ruben Orduz, reminded us last year:

“It’s a very time-consuming process, but it helps in selecting the best lineup while making sure every tutorial that had potential was given a fair chance. Compressing the timeline would mean only selecting from the top well-known proposers and forgetting the rest. That would be against our philosophy of giving chances to new instructors and increasing diversity.”

So we hope those of you with dreams of offering a tutorial will find the time within the next two weeks to get your proposal written up and submitted. Just visit our “Proposing a Tutorial” page for a guide to writing up your idea and getting it submitted — before November 30, when our Tutorials CFP will close once it is midnight and the day is over in every time zone. Good luck!

November 09, 2016 06:37 PM

Mike Driscoll

An intro to aiohttp

Python 3.5 added some new syntax that allows developers to create asynchronous applications and packages easier. One such package is aiohttp which is an HTTP client/server for asyncio. Basically it allows you to write asynchronous clients and servers. The aiohttp package also supports Server WebSockets and Client WebSockets. You can install aiohttp using pip:

pip install aiohttp

Now that we have aiohttp installed, let’s take a look at one of their examples!

Fetching a Web Page

The documentation for aiohtpp has a fun example that shows how to grab a web page’s HTML. Let’s take a look at it and see how it works:

import aiohttp
import asyncio
import async_timeout
 
async def fetch(session, url):
    with async_timeout.timeout(10):
        async with session.get(url) as response:
            return await response.text()
 
async def main(loop):
    async with aiohttp.ClientSession(loop=loop) as session:
        html = await fetch(session, 'http://www.blog.pythonlibrary.org')
        print(html)
 
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))

Here we just import aiohttp, Python’s asyncio and async_timeout, which gives us the ability to timeout a coroutine. We create our event loop at the bottom of the code and call the main() function. It will create a ClientSession object that we pass to our fetch() function along with what URL to fetch. Finally in the fetch() function, we use set our timeout and attempt to get the URL’s HTML. If everything works without timing out, you will see a bunch of text spewed into stdout.

Downloading Files with aiohttp

A fairly common task that developers will do is download files using threads or processes. We can download files using coroutines too! Let’s find out how:

import aiohttp
import asyncio
import async_timeout
import os
 
 
async def download_coroutine(session, url):
    with async_timeout.timeout(10):
        async with session.get(url) as response:
            filename = os.path.basename(url)
            with open(filename, 'wb') as f_handle:
                while True:
                    chunk = await response.content.read(1024)
                    if not chunk:
                        break
                    f_handle.write(chunk)
            return await response.release()
 
 
async def main(loop):
    urls = ["http://www.irs.gov/pub/irs-pdf/f1040.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040a.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040ez.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040sb.pdf"]
 
    async with aiohttp.ClientSession(loop=loop) as session:
        for url in urls:
            await download_coroutine(session, url)
 
 
if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main(loop))

You will notice here that we import a couple of new items: aiohttp and async_timeout. The latter is a actually one of the aiohttp’s dependencies and allows us to create a timeout context manager. Let’s start at the bottom of the code and work our way up. In the bottom conditional statement, we start our asynchronous event loop and call our main function. In the main function, we create a ClientSession object that we pass on to our download coroutine function for each of the urls we want to download. In the download_coroutine, we create an async_timeout.timeout() context manager that basically creates a timer of X seconds. When the seconds run out, the context manager ends or times out. In this case, the timeout is 10 seconds. Next we call our session’s get() method which gives us a response object. Now we get to the part that is a bit magical. When you use the content attribute of the response object, it returns an instance of aiohttp.StreamReader which allows us to download the file in chunks of whatever size we’d like. As we read the file, we write it out to local disk. Finally we call the response’s release() method, which will finish the response processing.

According to aiohttp’s documentation, because the response object was created in a context manager, it technically calls release() implicitly. But in Python, explicit is usually better and there is a note in the documentation that we shouldn’t rely on the connection just going away, so I believe that it’s better to just release it in this case.

There is one part that is still blocking here and that is the portion of the code that actually writes to disk. While we are writing the file, we are still blocking. There is another library called aiofiles that we could use to try and make the file writing asynchronous too We will take a look at that next.

Note: The section above cam from one of my previous articles.

Using aiofiles For Asynchronous Writing

You will need to install aiofiles to make this work. Let’s get that out of that way:

pip install aiofiles

Now that we have all the items we need, we can update our code!

import aiofiles
import aiohttp
import asyncio
import async_timeout
import os
 
 
async def download_coroutine(session, url):
    with async_timeout.timeout(10):
        async with session.get(url) as response:
            filename = os.path.basename(url)
            async with aiofiles.open(filename, 'wb') as fd:
                while True:
                    chunk = await response.content.read(1024)
                    if not chunk:
                        break
                    await fd.write(chunk)
            return await response.release()
 
 
async def main(loop):
    urls = ["http://www.irs.gov/pub/irs-pdf/f1040.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040a.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040ez.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040sb.pdf"]
 
    async with aiohttp.ClientSession(loop=loop) as session:
        for url in urls:
            await download_coroutine(session, url)
 
 
if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main(loop))

The only change is adding an import for aiofiles and then changing how we open the file. You will note that it is now

async with aiofiles.open(filename, 'wb') as fd:

And that we use await for the writing portion of the code:

await fd.write(chunk)

Other than that, the code is the same. There are some portability issues mentioned here that you should be aware of.

Wrapping Up

Now you should have some basic understanding of how to use aiohttp and aiofiles. The documentation for both projects is worth a look as this tutorial really only scratches the surface of what you can do with these libraries.