Planet Python

Last update: January 09, 2017 01:47 AM

January 08, 2017

Nikola

Nikola v7.8.2 is out!

On behalf of the Nikola team, I am pleased to announce the immediate availability of Nikola v7.8.2. It adds a ton of new features, while also fixing some bugs.

Key Changes

A rewrite of all taxonomies (tags, categories, sections, etc.) in a generic fashion, allowing for much greater flexibility (by Felix Fontein) — adds new settings, enables new features (CREATE_ARCHIVE_NAVIGATION) and customizability
Automatic cration of year/month/day directory structures (NEW_POST_DATE_PATH)
Ability to sort posts from within templates (sort_posts)
API changes for post compilers (new compile, compile_string functions)
Addition of a generator meta tag to default themes — we hope you do not mind a bit of promotion for Nikola?

What is Nikola?

Nikola is a static site and blog generator, written in Python. It can use Mako and Jinja2 templates, and input in many popular markup formats, such as reStructuredText and Markdown — and can even turn Jupyter (IPython) Notebooks into blog posts! It also supports image galleries, and is multilingual. Nikola is flexible, and page builds are extremely fast, courtesy of doit (which is rebuilding only what has been changed).

Find out more at the website: https://getnikola.com/

Downloads

Install using pip install Nikola or download tarballs on GitHub and PyPI.

Changes

Minor API change: The compile_string compiler method (partially internal) now takes a post argument and returns between two and four values, adding shortcode_deps and shortcode support. See issues #2623 and #2624.

Features

Add meta generator tag to default templates to promote Nikola (Issue #2619)
Add nikola new_post -d and NEW_POST_DATE_PATH to allow automatic creation of year/month/day (date-based) directory structures (Issue #2513)
Allow enabling pretty URLs with per-post setting (Issue #2613)
Add a sort_posts function (available as Jinja filter in global context), which allows general-purpose timeline sorting (Issue #2602)
Allow creating archive navigation (Issue #1639)
Accept a page argument for taxonomy paths (Issue #2585)
Query strings in magic links are passed as keyword arguments to path handlers (via Issue #2580)
Accept arbitrary arguments to path handlers (via Issue #2580)
Added new typogrify_oldschool filter (Issue #2574)
Improving handling of .dep files, and allowing compilers to specify additional targets for the render_posts task (Issue #2536)
render_template and generic_renderer can now create HTML fragments.
Allow posts to set custom URL_TYPE by using the url_type meta tag (useful for HTML fragments inserted using JavaScript)
Plugins can depend on other plugins being installed (Issue #2533)
The destination folder in POSTS and PAGES can now be translated (Issue #2116)
Pass post object and lang to post compilers (Issue #2531)
Pass url_type into template's context.
Make thumbnail naming configurable with IMAGE_THUMBNAIL_FORMAT.
There is a new plugin category Taxonomy which allows to easily create new taxonomies. All of the existing taxonomies (authors, archives, indexes, page index, sections, tags, and categories) have been converted to the new system. (Issue #2107)
Added CATEGORIES_INDEX_PATH, similar to TAGS_INDEX_PATH. (Issue #2567)
Made INDEX_PATH, RSS_PATH and AUTHOR_PATH translatable. (Issue #1914)
Added setting SHOW_INDEX_PAGE_NAVIGATION which enables a basic page navigation for indexes. (Issue #2299)
Added settings DISABLE_INDEXES_PLUGIN_INDEX_AND_ATOM_FEED and DISABLE_INDEXES_PLUGIN_RSS_FEED to disable specific parts of the classify_indexes taxonomy plugin. (Issue #2591)

Bugfixes

Work around conflicts between posts and sections trying to render index.html files (via Issue #2613)
Make AUTHOR_PAGES_ARE_INDEXES really work (Issue #2600)
WordPress importer now correctly handles & etc. in tags. (Issue #2557)
If CODE_COLOR_SCHEME is empty, don’t generate code.css (Issue #2597)
Don’t warn about nikolademo DISQUS account when comments are disabled (Issue #2588)
Make data from global context available to templated shortcodes as global_data (Issue #2488)
Don't crash if plugins is a file (Issue #2539)
Don't mangle bare # links (Issue #2553)
generic_index_renderer now always produces output. It previously did not when the post list was empty and INDEXES_STATIC == False. (via Issue #2579)

January 08, 2017 07:25 PM

Kushal Das

Using rkt and systemd

Few days back, I wrote about my usage of rkt containers. As rkt does not have any daemon running, the simplest way to have a container running is to start it inside some screen or tmux session. I started following the same path, I used a tmux session.

But then I wanted to have better control over the containers, to start or stop them as required. Systemd is the solution for all the other services in the system, that makes it an ideal candidate for this case too.

Example of a service file

[Unit]
Description=ircbot
Documentation=https://github.com/kushaldas/ircbot
Requires=network-online.target

[Service]
Slice=machine.slice
MemoryLimit=500M
ExecStart=/usr/bin/rkt --insecure-options=image --debug run --dns=8.8.8.8 --volume mnt,kind=host,source=/some/path,readOnly=false  /mnt/ircbot-latest-linux-amd64.aci
ExecStopPost=/usr/bin/rkt gc --mark-only
KillMode=mixed
Restart=always

The path of the service file is /etc/systemd/system/ircbot.service. In the [Unit] section, I mentioned a super short Description, and link to the documentation of the project. I also mentioned that this service requires network-online.target to be available first.

The [Service] is the part where we define all the required configurations. The first value we mention is the Slice.

Slices, a way to do resource control

Systemd uses slices to group a number of services, and slices in a hierarchical tree. This is built on top of the Linux Kernel Control Group feature. In a system by default, there are four different slices.

-.slice : The root slice.
system.slice : All system services are in this slice.
machine.slice : All vms and containers are in this slice.
user.slice : All user sessions are in this slice.

We can see the whole hierarchy using the systemd-cgls command. For example:

Control group /:
-.slice
├─machine.slice
│ ├─ircbot.service
│ │ ├─11272 /usr/bin/systemd-nspawn --boot --register=true -Zsystem_u:system_r:container_t:s0:c447,c607 -Lsystem_u:object_r:container_file_t:s0:c447,
│ │ ├─init.scope
│ │ │ └─11693 /usr/lib/systemd/systemd --default-standard-output=tty
│ │ └─system.slice
│ │   ├─ircbot.service
│ │   │ └─11701 /usr/bin/ircbot
│ │   └─systemd-journald.service
│ │     └─11695 /usr/lib/systemd/systemd-journald
├─user.slice
│ └─user-1000.slice
│   ├─session-31.scope
│   │ ├─16228 sshd: kdas [priv]
│   │ ├─16231 sshd: kdas@pts/0
│   │ ├─16232 -bash
│   │ ├─16255 sudo su -
│   │ ├─16261 su -
│   │ └─16262 -bash

You can manage various resources using cgroups. Here, in our example service file, I mentioned that memory limit for the service is 500MB. You can read more here on resource management.

There is also systemd-cgtop tool, which will give you a top like view for the various resources consumed by the slices.

# systemd-cgtop -M rkt-250d0c2b-0130-403b-a9a6-3bb3bde4e934

Control Group                                                           Tasks   %CPU   Memory  Input/s Output/s
/machine.slice/ircbot.service                                             9      -   234.0M        -        -
/machine.slice/ircbot.service/system.slice                                -      -     5.0M        -        -
/machine.slice/ircbot.service/system.slice/ircbot.service                 -      -     5.0M        -        -

The actual command which we used to run the container is mentioned in ExecStart.

Using the service

I can now use the standard systemctl commands for this new ircbot service. For example:

# systemctl start ircbot
# systemctl enable ircbot
# systemctl stop ircbot
# systemctl status ircbot

You can also view the log of the application using journalctl command.

# journalctl -u ircbot

The documentation from rkt has more details on systemd and rkt.

January 08, 2017 12:01 PM

Vasudev Ram

An Unix seq-like utility in Python

By Vasudev Ram

Due to a chain (or sequence - pun intended :) of thoughts, I got the idea of writing a simple version of the Unix seq utility (command-line) in Python. (Some Unix versions have a similar command called jot.)

Note: I wrote this program just for fun. As the seq Wikipedia page says, modern versions of bash can do the work of seq. But this program may still be useful on Windows - not sure if the CMD shell has seq-like functionality or not. PowerShell probably has it, is my guess.)

The seq command lets you specify one or two or three numbers as command-line arguments (some of which are optional): the start, stop and step values, and it outputs all numbers in that range and with that step between them (default step is 1). I have not tried to exactly emulate seq, instead I've written my own version. One difference is that mine does not support the step argument (so it can only be 1), at least in this version. That can be added later. Another is that I print the numbers with spaces in between them, not newlines. Another is that I don't support floating-point numbers in this version (again, can be added).

The seq command has more uses than the above description might suggest (in fact, it is mainly used for other things than just printing a sequence of numbers - after all, who would have a need to do that much). Here is one example, on Unix (from the Wikipedia article about seq):

# Remove file1 through file17:
for n in `seq 17`
do
    rm file$n
done

Note that those are backquotes or grave accents around seq 17 in the above code snippet. It uses sh / bash syntax, so requires one of them, or a compatible shell.

Here is the code for seq1.py:

'''
seq1.py
Purpose: To act somewhat like the Unix seq command.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
'''

import sys

def main():
    sa, lsa = sys.argv, len(sys.argv)
    if lsa < 2:
        sys.exit(1)
    try:
        start = 1
        if lsa == 2:
            end = int(sa[1])
        elif lsa == 3:
            start = int(sa[1])
            end = int(sa[2])
        else: # lsa > 3
            sys.exit(1)
    except ValueError as ve:
        sys.exit(1)

    for num in xrange(start, end + 1):
        print num, 
    sys.exit(0)
    
if __name__ == '__main__':
    main()

And here are a few runs of seq1.py, and the output of each run, below:

$ py -2 seq1.py

$ py -2 seq1.py 1
1

$ py -2 seq1.py 2
1 2

$ py -2 seq1.py 3
1 2 3

$ py -2 seq1.py 1 1
1

$ py -2 seq1.py 1 2
1 2

$ py -2 seq1.py 1 3
1 2 3

$ py -2 seq1.py 4
1 2 3 4

$ py -2 seq1.py 1 4
1 2 3 4

$ py -2 seq1.py 2 2
2

$ py -2 seq1.py 5 3

$ py -2 seq1.py -6 -2
-6 -5 -4 -3 -2

$ py -2 seq1.py -4 -0
-4 -3 -2 -1 0

$ py -2 seq1.py -5 5
-5 -4 -3 -2 -1 0 1 2 3 4 5

There are many other possible uses for seq, if one uses one's imagination, such as rapidly generating various filenames or directory names, with numbers in them (as a prefix, suffix or in the middle), for testing or other purposes, etc.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Managed WordPress Hosting by FlyWheel

Share |

January 08, 2017 04:26 AM

January 07, 2017

Christoph Zwerschke

Never iterate a changing dict

Yesterday I noticed a bug in a Python program that only appeared when running it with the new Python 3.6.

It turned out that the program had been running a slippery slope all the time.

Essentially, the program tried to find for a given list of field names, like

names = ['alpha', 'bravo', 'charlie', 'delta']

and a given format-string, like

fmtstr = 'show only {alpha}, {bravo} and {charlie}'

which of the given field names where used in the string. The following code was used to examine this:

used_names = []
d = dict.fromkeys(names)
for k in d:
    del d[k]
    try:
        fmtstr.format(**d)
    except KeyError:
        used_names.append(k)
    d[k] = None

print("Used names:", ', '.join(sorted(used_names)))

The code simply tries to format the string while successively omitting one of the given field names. If formatting fails with a KeyError, it knows that the field name is used.

When you run this code with Python versions before 3.6, it works as expected:

Used names: alpha, bravo, charlie

However, when you try to run it with Python 3.6, it will print out something very strange:

Used names: alpha, alpha, alpha, alpha, alpha, bravo, bravo, bravo, bravo

What’s happening here? Can you spot the problem?

If you look carefully, you probably see it: Yes, this is another instance of the dreaded “changing a mutable object while iterating it” problem, that you surely have already experienced sometime or other when getting an error message like this:

RuntimeError: dictionary changed size during iteration

In this case, however, the dictionary did not change its size. Actually it did not even change its keys between iterations. So you wouldn’t think there could be a problem with this code, and in fact it worked fine until recently. But in Python 3.6 the dict type has been reimplemented to use a more compact representation. This implementation does not pardon iterating over a changing dictionary even if you try to restore removed keys immediately, since it uses insertion order with an additional level of indirection, which causes hiccups when iterating while keys are removed and re-inserted, thereby changing the order and internal pointers of the dict.

Note that this problem is not fixed by iterating d.keys() instead of d, since in Python 3, d.keys() returns a dynamic view of the keys in the dict which results in the same problem. Instead, iterate over list(d). This will produce a list from the keys of the dictionary that will not change during iteration. Or you can also iterate over sorted(d) if a sorted order is important.

Just to make this clear: This is not a bug in Python 3.6. Iterating an object and changing it at the same time was always considered unsafe and bad style. The benefits of the new dict implementation are great enough to accept this kind of incompatibility. However, I wonder whether it would be possible and beneficial to safeguard the for loop with a check of the private version of the dict that has just been added in Python 3.6 as well (see PEP509), and raise a RuntimeError if the version changes, similarly to how a change of the dictionary size is already detected and reported as an error. Then running programs like the one above would raise an error instead of failing in strange and nondeterministic ways.

To sum up the morale of this story: Never iterate a changing dictionary, even if you preserve its size and keys. Instead run the for loop over a copy of the keys or items of the dictionary.

January 07, 2017 06:52 PM

Django Weblog

2017 DSF Board Election Results

We're happy to announce the winners of the DSF Board elections for 2017.

Frank Wiles, Daniele Procida, and James Bennett were re-elected for another term. Our new Board members are Kenneth Love, Ken W. Alger, and Rebecca Conley.

Rebecca, as you may be aware, served as Board Secretary during 2016 to fill a vacancy but will be returning again this year.

We wish to thank Christophe Pettus and Karen Tracey who did not run again this year for their service and the wisdom they brought to us.

The Board will be having our first meeting in the coming days to ratify the slate of officers at which time we'll update the website accordingly.

We look forward to another great year of helping further Django and the Django Community.

January 07, 2017 06:49 PM

Programming Ideas With Jake

Default Implementations Using Delegation

Let's look at how we can change default method implementations to use delegation instead of inheritance, not that it's even a good idea.

January 07, 2017 06:00 AM

Weekly Python Chat

Making your MVP

Special guest Tracy Osborn, author of Hello Web App, will join us to talk about launching your side project.

January 07, 2017 12:00 AM

Jason Meyers

Making the Python REPL output Pretty

Recently, there was a tweet by Ned Batchelder that illustrated how to make Python REPL output prettier. I went to implement it, and wanted to put together some instructions for the future.

PYTHONSTARTUP

The PYTHONSTARTUP referenced here is an environment variable that points to a python file that we can place a series of command into that will be evaluated when we launch a Python REPL. I have the following code in a file called .python-startup.py.

    import pprint
    import sys

    sys.ps1 = "\033[0;34m>>> \033[0m"
    sys.ps2 = "\033[1;34m... \033[0m"
    sys.displayhook = pprint.pprint

The code above will set the >>> to a light blue and the ... to a darker blue, but this isn’t the part you are here for probably. You want the next line, which sets the displayhook for output to pretty print.

Next, you can export the PYTHONSTARTUP environment variable pointing to your file as shown here.

export PYTHONSTARTUP=~/.python-startup.py

You can also add this to your .bashrc or .zshrc depending on which shell you are using, and it will always make your python repl output be pretty.

So what is the difference?

First let’s look at the normal output:

    >>> dessert = {'cookies': {'chocolate chip': 1, 'oatmeal raisin': 12, 'peanut butter': 3},
    ...     'cake': 'OMG NO!',
    ...     'pie': {'apple': 1, 'peach': 2, 'fudge': 0}}
    >>> dessert
    {'cookies': {'chocolate chip': 1, 'oatmeal raisin': 12, 'peanut butter': 3}, 'cake': 'OMG NO!', 'pie': {'apple': 1, 'peach': 2, 'fudge': 0}}

Now let’s see it with the pretty print in place:

    >>> dessert = {'cookies': {'chocolate chip': 1, 'oatmeal raisin': 12, 'peanut butter': 3},
    ...     'cake': 'OMG NO!',
    ...     'pie': {'apple': 1, 'peach': 2, 'fudge': 0}}
    >>> dessert
    {'cake': 'OMG NO!',
     'cookies': {'chocolate chip': 1, 'oatmeal raisin': 12, 'peanut butter': 3},
      'pie': {'apple': 1, 'fudge': 0, 'peach': 2}}

If your are curious what the colors look like: Output Preview

January 07, 2017 12:00 AM

January 06, 2017

Peter Bengtsson

ElasticSearch 5 in Travis-CI

Getting ElasticSearch 5.1.1 installed in .travis.yml was hard but I eventually figured it out.

January 06, 2017 10:22 PM

Weekly Python StackOverflow Report

(liv) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2017-01-06 18:27:11 GMT

What is the difference between i = i + 1 and i += 1 in a 'for' loop? - [78/6]
Why does list ask about __len__? - [22/2]
Why is regex search slower with capturing groups in Python? - [16/2]
ZipFile.testzip() returning different results on Python 2 and Python 3 - [15/2]
Find the number of characters in a file using python - [13/11]
Python list error: [::-1] step on [:-1] slice - [12/5]
push_back/emplace_back a shallow copy of an object into another vector - [11/4]
Identify clusters linked by delta to the left and different delta to the right - [11/1]
Python overriding __init__ - [10/2]
Average face - algorithm - [10/1]

January 06, 2017 06:27 PM

Luke Plant

Django admin tips Twitter account

I started a new Twitter account — @djangoadmintips, with curated tips about using the Django admin. While it has its limits, there are tons of great ways to extend the Django admin and make the most out of it, which many people are not aware of.

In addition to curating the best of what I find on the web, there will also be some original blog posts, which will usually have accompanying code.

January 06, 2017 04:06 PM

PyTennessee

PyTN Profiles: Maxwell Collins-Shenfield and Eventbrite

Speaker Profile: Maxwell Collins-Shenfield

I work on the Financial Transactions team at Eventbrite. Our jump is to ensure accurate and complete accounting of all transactions that take place on the Eventbrite platform. We also provide tools and guidance for our support staff dealing with billing and accounting concerns.

Maxwell will be presenting “How to Handle Money in your Code” at 1:00PM Saturday (2/4) in the Room 100. Dealing with money in python is easy to mess up, and the consequences can be lasting for your business and brand. Learn some simple rules for storing, processing, and displaying money the right way, with some real world examples using Django.

Sponsor Profile: Eventbrite (@britenashville)

Eventbrite is the world’s largest self-service ticketing platform. We build the technology to allow anyone to create, share, find and attend events that fuel their passions and enrich their lives. Music festivals, marathons, conferences, hackathons, air guitar competitions, political rallies, fundraisers, gaming competitions - you name it, we power it. Our mission? To bring the world together through live experiences.

Our technology spans across web, mobile, big data, search, physical point of sale and mobile box office devices; and we’re committed to releasing open-source packages so others can benefit from and contribute to our work. Our team is whip-smart, fun-loving, and building products that are used and loved by millions worldwide.

January 06, 2017 03:20 PM

Dataquest

Why You Should Use Dataquest To Learn Data Science

When I launched Dataquest a little under two years ago, one of the first things I did was write a blog post about why. At the time, if you wanted to become a data scientist, you were confronted with dozens of courses on sites like edX or Coursera with no easy path to getting a job.

I saw many promising students give up on learning data science because they got stuck in a loop of taking the same courses over and over. There were two main barriers to learning data science that I was trying to solve with Dataquest: the challenge of getting from theory to application, and the challenge of knowing what to learn next.

I strongly believe that everyone deserves a chance to do work that they find interesting, and Dataquest was a way to put that belief into action and help others get a toehold in a difficult field. Over the past two years, we’ve made it simple to learn all of the skills you need for a data science role in one place. From basic Python to SQL to Machine Learning, Dataquest teaches you the right skills, and helps you build a portfolio of...

January 06, 2017 12:00 PM

Import Python

ImportPython Weekly 106 - Golang runtime for Python, Handling Unicode Strings, Packaging and more

Worthy Read

Go running Python!

Grumpy is an experimental Python runtime for Go. It translates Python code into Go programs, and those transpiled programs run seamlessly within the Go runtime. We needed to support a large existing Python codebase, so it was important to have a high degree of compatibility with CPython (quirks and all). The goal is for Grumpy to be a drop-in replacement runtime for any pure-Python project. Curator's note - If you are a Go Programming Language Developer do checkout http://importgolang.com/newsletter/
golang
, grumpy

Avoid Failure by Developing a Toolchain That Enables DevOps

Discover how you can automate & optimize your projects using a DevOps toolchain architecture that enables a continuous delivery process.

Sponsor

Hey Python 2.x programmers do you see yourself making the leap to 3.x in 2017 ?

Take the twitter poll, Do you see yourself using 3.x in 2017.
importpython

Handling Unicode Strings in Python

I am a seasoned python developer, I have seen many UnicodeDecodeError myself, I have seen many new pythonista experience problems related to unicode strings. Actually understanding and handling text data in computer is never easy. Sometimes the programming language makes it even harder. In this post, I will try to explain everything about text and unicode handling in python.
unicode

5 things to watch in Python in 2017

An improved asyncio module, Pyjion for speed, and moving to Python 3 will make for a rich Python ecosystem.

A small guide to help you install Scikit-Learn and Get started with Machine Learning on Linux

In this Tutorial I will describe how you can get started with Machine Learning on Linux using Scikit-Learn and Python 3.
machine learning

Packaging Python software with pbr

A library for managing setuptools packaging needs in a consistent manner. pbr reads and then filters the setup.cfg data through a setup hook to fill in default values and provide more sensible behaviors, and then feeds the results in as the arguments to a call to setup.py - so the heavy lifting of handling python packaging needs is still being done by setuptools.
packaging

Origin of BDFL

This is an old article written by Guido van van Rossum himself. Occasionally people ask me about the origins of my nickname BDFL (Benevolent Dictator For Life). At some point, Wikipedia claimed it was from a Monty Python skit, which is patently false, although it has sometimes been called a Pythonesque title. I recently trawled through an old mailbox of mine, and found a message from 1995 that pinpoints the origin exactly. I'm including the entire message here, to end any doubts that the term originated in the Python community.
Guido

Armin explains the Flask globals - Video

This talk explores how you can build applications and APIs with Flask step by step by being easy to test and scale to larger and more complex scenarios. The talk will also go a bit into the history of some design decisions in Flask and what works well and in which areas you might want to mix it with other technologies for better results.
flask
, video

How to view Django ORM SQL queries

Copy-paste this into your Python 3 interpreter to see a human-readable version of the raw SQL queries that your Django code is running.
ORM

DSF announces winner of the 2016 Malcolm Tredinnick Memorial Prize

Aisha (@AishaXBello) joined the Django community when she attended a Django Girls workshop during EuroPython in 2015. From that point on, Aisha's trajectory in the Django world was unstoppable. She is not only a talented developer but her desire to keep learning and sharing her knowledge with others is simply inspiring. She organized or helped organize a huge number of Django Girls workshop in her home country of Nigeria. Thanks to her, Nigeria is on its way to be the world-record holder of most Django Girls events organized.
djangogirls

Django or Flask - Reddit Discussion

flask

Lane detection with NumPy

Detect lanes on video frames, using NumPy and SciPy. My goal is not to achieve better performance or speed then with OpenCV. Rather, I’m going to implement some techniques learned at the Computer Vision course.
numpy
, scipy

Side-effects of Python Machine Learning

machine learning

Weather forecast from IP address

code snippet

Jupyter + Pachyderm — Part 1, Exploring and Understanding Historical Analyses

jupyter

Episode 8: Armin Ronacher on Flask, Python Ecosystem, and Unicode

podcast

What I Learned Implementing a Classifier from Scratch in Python ?

Machine learning can be intimidating for a newcomer. The concept of a machine learning things alone is quite abstract. How does that work in practice ?. In order to demystify some of the magic behind machine learning algorithms, I decided to implement a simple machine learning algorithm from scratch. I will not be using a library such as scikit-learn which already has many algorithms implemented. Instead, I’ll be writing all of the code in order to have a working binary classifier algorithm. The goal of this exercise is to understand its inner workings.
machine learning
, classification

Projects

hello-vue-django - 11 Stars, 1 Fork

vuejs and Django integration with hot code reload

alexabot-asana - 6 Stars, 1 Fork

AlexaBot for Asana -- Create Asana Tasks with Amazon Echo

tensorflow-mnist-cnn - 5 Stars, 1 Fork

MNIST classification using Convolutional NeuralNetwork. Various techniques such as data augmentation, dropout, batchnormalization, etc are implemented.

January 06, 2017 11:09 AM

Django Weekly

Django Weekly 20 - Django REST Chapter, View ORM SQL Query, Top 10 Django Projects, Testing Views

Worthy Read

Django-Rest-Framework Draft Appendix in Test-Driven Development with Python Book

Harry the Author of the book is looking for feedback on the Appendix. Having said that if you haven't used DSF this is a good introduction to DSF.
DSF

Avoid Failure by Developing a Toolchain That Enables DevOps

Discover how you can automate & optimize your projects using a DevOps toolchain architecture that enables a continuous delivery process.
sponsor

Use Django or end up building a Django

Many developers either beginners or advanced ones, struggle to pick up a right framework to work with. Some like the simplicity of Flask that gives them a lot of freedom so they can choose what so ever they want to. That’s the problem right there, you have to choose, decide, maintain, glue and assemble tons of libraries to get your project working and keep doing this in as long as your application exists.
django
, flask

How to view Django ORM SQL queries

See a human-readable version of the raw SQL queries that your Django code is running.

Top 10 Django Projects from 2016

Curator's Note - Not a big fan of Top N listing. The author has listed his criteria and the end of the article for selecting these projects. It's a good list of Django projects nevertheless, have a look.
packages

3 ways of testing Django views - By Yoong Kang Lim

I tried 3 different ways of testing a Django view. These are my thoughts.
testing

Django-Allauth vs. Django Social Auth vs. Python-Social-Auth

authentication

DSF announces winner of the 2016 Malcolm Tredinnick Memorial Prize

The Django Software Foundation (DSF) is proud to announce the winner of the 2016 Malcolm Tredinnick Memorial Prize: Aisha Bello!.
DSF

Projects

DjangoHero - 28 Stars, 2 Fork

DjangoHero is the fastest way to set up a Django project on the cloud (using Heroku)

Chrome Logger support for Django - 19 Stars, 0 Fork

Chrome-Logger support for Django. ChromeLogger is a protocol which allows sending logging messages to the Browser. This module implements simple support for Django. It consists of two components LoggingMiddleware which is responsible for sending all log messages associated with the request to the browser. ChromeLoggerHandler a python logging handler which collects all messages.

django-chartjs-engine - 7 Stars, 0 Fork

An engine for creating chartjs javascript charts in django

January 06, 2017 10:45 AM

Ned Batchelder

No PyCon for me this year

2017 will be different for me in one specific way: I won't be attending PyCon. I've been to ten in a row:

This year, Open edX con is in Madrid two days later after PyCon, actually overlapping with the sprints. I'm not a good enough traveler to do both. Crossing nine timezones is not something to be taken lightly.

I'll miss the usual love-fest at PyCon, but after ten in a row, it should be OK to miss one. I can say that now, but probably in May I will feel like I am missing the party. Maybe I really will watch talks on video for a change.

I usually would be working on a presentation to give. I like making presentations, but it is a lot of work. This spring I'll have that time back.

In any case, this will be a new way to experience the Python community. See you all in 2018 in Cleveland!

January 06, 2017 02:23 AM

January 05, 2017

Mike Driscoll

wxPython Cookbook Artist Interview: Liza Tretyakova

I always put a lot of thought into the covers of my book. For my first book on wxPython, I thought it would be fun to do a cookbook because I already had a lot of recipes on my blog. So I went with the idea of doing a cookbook. For the cover, my first thought was to have some kind of kitchen scene with mice cooks. Then I decided that was too obvious and decided to go with the idea of an Old West cover with cowboy (or cow mice) cooking at a fire.

I asked Liza Tretyakova, my cover artist for wxPython Cookbook, to do a quick interview about herself. Here is what she had to say:

Can you tell us a little about yourself (hobbies, education, etc):

My name is Liza Tretyakova, I’m a free-lance illustrator currently working in Moscow.

Education:

Moscow State University, Faculty of History of Arts
BA(Hons) Illustration, University of Plymouth

I work as an illustrator for about 10 years. I love horses and I used to have a horse. Also I’m interested in archery. I like reading and spending a lot of time with my daughter Yara, who is 7 years old.

What motivated you to be an illustrator versus some other profession?

Since I was a child I have been drawing all the time and it just happened that I started to work as an illustrator, it turned into a profession.

What process do you go through when you are creating a new piece of art?

It is different every time, there is no specific “recipe”

Do you have any advice for someone who wants to be an illustrator?

You should try to draw every day, the more the better.

Do you have anything else you would like to say?

It was a pleasure working with you!

Thanks so much for doing the interview and for agreeing to be my illustrator for my wxPython Cookbook.

You can see more of Liza’s work on Behance.

January 05, 2017 09:45 PM

João Laia

Multiprocessing in Python via C

Python plays very well with both C and Fortran. It is relatively easy to extend it using these languages, and to run very fast code in python. Additionally, using the OpenMP API, it is easy to parallelize it.

January 05, 2017 09:39 PM

Eli Bendersky

Some notes on Luz - an assembler, linker and CPU simulator

A few years ago I wrote about Luz - a self-educational project to implement a CPU simulator and a toolchain for it, consisting of an assembler and a linker. Since then, I received some questions by email that made me realize I could do a better job explaining what the project is and what one can learn from it.

So I went back to the Luz repository and fixed it up to be more modern, in-line with current documentation standards on GitHub. The landing README page should now provide a good overview, but I also wanted to write up some less formal documentation I could point to - a place to show-off some of the more interesting features in Luz; a blog post seemed like the perfect medium for this.

As before, it makes sense to start with the Luz toplevel diagram:

Luz is a collection of related libraries and programs written in Python, implementing all the stages shown in the diagram above.

The CPU simulator

The Luz CPU is inspired by MIPS (for the instruction set), by Altera Nios II (for the way "peripherals" are attached to the CPU), and by MPC 555 (for the memory controller) and is aimed at embedded uses, like Nios II. The Luz user manual lists the complete instruction set explaining what each instructions means.

The simulator itself is functional only - it performs the instructions one after the other, without trying to simulate how long their execution takes. It's not very remarkable and is designed to be simple and readable. The most interesting feature it has, IMHO, is how it maps "peripherals" and even CPU control registers into memory. Rather than providing special instructions or traps for OS system calls, Luz facilitates "bare-metal" programming (by which I mean, without an OS) by mapping "peripherals" into memory, allowing the programmer to access them by reading and writing special memory locations.

My inspiration here was soft-core embeddable CPUs like Nios II, which let you configure what peripherals to connect and how to map them. The CPU can be configured before it's loaded onto real HW, for example to attach as many SPI interfaces as needed. For Luz, to create a new peripheral and attach it to the simulator one implements the Peripheral interface:

class Peripheral(object):
    """ An abstract memory-mapped perhipheral interface.
        Memory-mapped peripherals are accessed through memory
        reads and writes.

        The address given to reads and writes is relative to the
        peripheral's memory map.
        Width is 1, 2, 4 for byte, halfword and word accesses.
    """
    def read_mem(self, addr, width):
        raise NotImplementedError()

    def write_mem(self, addr, width, data):
        raise NotImplementedError()

Luz implements some built-in features as peripherals as well; for example, the core registers (interrupt control, exception control, etc). The idea here is that embedded CPUs can have multiple custom "registers" to control various features, and creating dedicated names for them bloats instruction encoding (you need 5 bits to encode one of 32 registers, etc.); it's better to just map them to memory.

Another example is the debug queue - a peripheral useful for testing and debugging. It's a single word mapped to address 0xF0000 in the simulator. When the peripheral gets a write, it stores it in a special queue and optionally emits the value to stdout. The queue can later be examined. Here is a simple Luz assembly program that makes use of it:

# Counts from 0 to 9 [inclusive], pushing these numbers into the debug queue

    .segment code
    .global asm_main

    .define ADDR_DEBUG_QUEUE, 0xF0000

asm_main:
    li $k0, ADDR_DEBUG_QUEUE

    li $r9, 10                          # r9 is the loop limit
    li $r5, 0                           # r5 is the loop counter

loop:
    sw $r5, 0($k0)                      # store loop counter to debug queue
    addi $r5, $r5, 1                    # increment loop counter
    bltu $r5, $r9, loop                 # loop back if not reached limit

    halt

Using the interactive runner to run this program we get:

$ python run_test_interactive.py loop_simple_debugqueue
DebugQueue: 0x0
DebugQueue: 0x1
DebugQueue: 0x2
DebugQueue: 0x3
DebugQueue: 0x4
DebugQueue: 0x5
DebugQueue: 0x6
DebugQueue: 0x7
DebugQueue: 0x8
DebugQueue: 0x9
Finished successfully...
Debug queue contents:
['0x0', '0x1', '0x2', '0x3', '0x4', '0x5', '0x6', '0x7', '0x8', '0x9']

Assembler

There's a small snippet of Luz assembly shown above. It's your run-of-the-mill RISC assembly, with the familiar set of instructions, fairly simple addressing modes and almost every instruction requiring registers (note how we can't store into the debug queue directly, for example, without dereferencing a register that holds its address).

The Luz user manual contains a complete reference for the instructions, including their encodings. Every instruction is a 32-bit word, with the 6 high bits for the opcode (meaning up to 64 distinct instructions are supported).

The code snippet also shows off some special features of the full Luz toolchain, like the special label asm_main. I'll discuss these later on in the section about linking.

Assembly languages are usually fairly simple to parse, and Luz is no exception. When I started working on Luz, I decided to use the PLY library for the lexer and parser mainly because I wanted to play with it. These days I'd probably just hand-roll a parser.

Luz takes another cool idea from MIPS - register aliases. While the assembler doesn't enforce any specific ABI on the coder, some conventions are very important when writing large assembly programs, and especially when interfacing with routines written by other programmers. To facilitate this, Luz designates register aliases for callee-saved registers and temporary registers.

For example, the general-purpose register number 19 can be referred to in Luz assembly as $r19 but also as $s1 - the callee-saved register 1. When writing standalone Luz programs, one is free to ignore these conventions. To get a taste of how ABI-conformant Luz assembly would look, take a look at this example.

To be honest, ABI was on my mind because I was initially envisioning a full programming environment for Luz, including a C compiler. When you have a compiler, you must have some set of conventions for generated code like procedure parameter passing, saved registers and so on; in other words, the platform ABI.

Linker

In my view, one of the distinguishing features of Luz from other assembler projects out there is the linker. Luz features a full linker that supports creating single "binaries" from multiple assembly files, handling all the dirty work necessary to make that happen. Each assembly file is first "assembled" into a position-independent object file; these are glued together by the linker which applies the necessary relocations to resolve symbols across object files. The prime sieve example shows this in action - the program is divided into three .lasm files: two for subroutines and one for "main".

As we've seen above, the main subroutine in Luz is called asm_main. This is a special name for the linker (not unlike the _start symbol for modern Linux assemblers). The linker collects a set of object files produced by assembly, and makes sure to invoke asm_main from the special location 0x100000. This is where the simulator starts execution.

Luz also has the concept of object files. They are not unlike ELF images in nature: there's a segment table, an export table and a relocation table for each object, serving the expected roles. It is the job of the linker to make sense in this list of objects and correctly connect all call sites to final subroutine addresses.

Luz's standalone assembler can write an assembled image into a file in Intel HEX format, a popular format used in embedded systems to encode binary images or data in ASCII.

The linker was quite a bit of effort to develop. Since all real Luz programs are small I didn't really need to break them up into multiple assembly files; but I really wanted to learn how to write a real linker :) Moreover, as already mentioned my original plans for Luz included a C compiler, and that would make a linker very helpful, since I'd need to link some "system" code into the user's program. Even today, Luz has some "startup code" it links into every image:

# The special segments added by the linker.
# __startup: 3 words
# __heap: 1 word
#
LINKER_STARTUP_CODE = string.Template(r'''
        .segment __startup

    LI      $$sp, ${SP_POINTER}
    CALL    asm_main

        .segment __heap
        .global __heap
    __heap:
        .word 0
''')

This code sets up the stack pointer to the initial address allocated for the stack, and calls the user's asm_main.

Debugger and disassembler

Luz comes with a simple program runner that will execute a Luz program (consisting of multiple assembly files); it also has an interactive mode - a debugger. Here's a sample session with the simple loop example shown above:

$ python run_test_interactive.py -i loop_simple_debugqueue

LUZ simulator started at 0x00100000

[0x00100000] [lui $sp, 0x13] >> set alias 0
[0x00100000] [lui $r29, 0x13] >> s
[0x00100004] [ori $r29, $r29, 0xFFFC] >> s
[0x00100008] [call 0x40003 [0x10000C]] >> s
[0x0010000C] [lui $r26, 0xF] >> s
[0x00100010] [ori $r26, $r26, 0x0] >> s
[0x00100014] [lui $r9, 0x0] >> s
[0x00100018] [ori $r9, $r9, 0xA] >> s
[0x0010001C] [lui $r5, 0x0] >> s
[0x00100020] [ori $r5, $r5, 0x0] >> s
[0x00100024] [sw $r5, 0($r26)] >> s
[0x00100028] [addi $r5, $r5, 0x1] >> s
[0x0010002C] [bltu $r5, $r9, -2] >> s
[0x00100024] [sw $r5, 0($r26)] >> s
[0x00100028] [addi $r5, $r5, 0x1] >> s
[0x0010002C] [bltu $r5, $r9, -2] >> s
[0x00100024] [sw $r5, 0($r26)] >> s
[0x00100028] [addi $r5, $r5, 0x1] >> r
$r0   = 0x00000000   $r1   = 0x00000000   $r2   = 0x00000000   $r3   = 0x00000000
$r4   = 0x00000000   $r5   = 0x00000002   $r6   = 0x00000000   $r7   = 0x00000000
$r8   = 0x00000000   $r9   = 0x0000000A   $r10  = 0x00000000   $r11  = 0x00000000
$r12  = 0x00000000   $r13  = 0x00000000   $r14  = 0x00000000   $r15  = 0x00000000
$r16  = 0x00000000   $r17  = 0x00000000   $r18  = 0x00000000   $r19  = 0x00000000
$r20  = 0x00000000   $r21  = 0x00000000   $r22  = 0x00000000   $r23  = 0x00000000
$r24  = 0x00000000   $r25  = 0x00000000   $r26  = 0x000F0000   $r27  = 0x00000000
$r28  = 0x00000000   $r29  = 0x0013FFFC   $r30  = 0x00000000   $r31  = 0x0010000C

[0x00100028] [addi $r5, $r5, 0x1] >> s 100
[0x00100030] [halt] >> q

There are many interesting things here demonstrating how Luz works:

Note the start up at 0x1000000 - this is where Luz places the start-up segment - three instructions that set up the stack pointer and then call the user's code (asm_main). The user's asm_main starts running at the fourth instruction executed by the simulator.
li is a pseudo-instruction, broken into two real instructions: lui for the upper half of the register, followed by ori for the lower half of the register. The reason for this is li having a 32-bit immediate, which can't fit in a Luz instruction. Therefore, it's broken into two parts which only need 16-bit immediates. This trick is common in RISC ISAs.
Jump labels are resolved to be relative by the assembler: the jump to loop is replaced by -2.
Disassembly! The debugger shows the instruction decoded from every word where execution stops. Note how this exposes pseudo-instructions.

The in-progress RTL implementation

Luz was a hobby project, but an ambitious one :-) Even before I wrote the first line of the assembler or simulator, I started working on an actual CPU implementation in synthesizable VHDL, meaning to get a complete RTL image to run on FPGAs. Unfortunately, I didn't finish this part of the project and what you find in Luz's experimental/luz_uc directory is only 75% complete. The ALU is there, the registers, the hookups to peripherals, even parts of the control path - dealing with instruction fetching, decoding, etc. My original plan was to implement a pipelined CPU (a RISC ISA makes this relatively simple), which perhaps was a bit too much. I should have started simpler.

Conclusion

Luz was an extremely educational project for me. When I started working on it, I mostly had embedded programming experience and was just starting to get interested in systems programming. Luz flung me into the world of assemblers, linkers, binary images, calling conventions, and so on. Besides, Python was a new language for me at the time - Luz started just months after I first got into Python.

Its ~8000 lines of Python code are thus likely not my best Python code, but they should be readable and well commented. I did modernize it a bit over the years, for example to make it run on both Python 2 and 3.

I still hope to get back to the RTL implementation project one day. It's really very close to being able to run realistic assembly programs on real hardware (FPGAs). My dream back then was to fully close the loop by adding a Luz code genereation backend to pycparser. Maybe I'll still fulfill it one day :-)

January 05, 2017 02:27 PM

PyTennessee

PyTN Profiles: Brandon Wannamaker and Emma

Speaker Profile: Brandon Wannamaker (@huntgathergrow)

Brandon is a husband, hiker, computer nerd and backyard farmer. He’s currently a Quality Engineer at Emma in Nashville.

Brandon will be presenting “A Developer’s Guide to Full Stack Personal Maintenance” at 11:00AM Saturday (2/4) in the Room 300. Developers spend a lot of time in front of a screen, not moving very much and often not eating good food. In this talk, I’ll share how yoga, my wife & bacon are helping me refactor & maintain my own personal stack.

Sponsor Profile: Emma (@emmaemail)

Emma is a provider of best-in-class email marketing software and services that help organizations of all sizes get more from their marketing. Through tailored editions of its platform for businesses, franchises, retailers, universities and agencies, Emma aims to offer enterprise-level capabilities in a team-friendly experience that’s simple and enjoyable. Key features include mobile-ready design templates, email automation, audience segmenting and dynamic content, plus integration with top CRM solutions, ecommerce platforms and social networks. Headquartered in Nashville, and with offices in Portland, New York and Melbourne, Emma powers the emails of more than 50,000 organizations worldwide, including Mario Batali, Bridgestone and Sylvan Learning Center. To learn more, visit myemma.com, follow Emma on Twitter or find us on Facebook.

January 05, 2017 02:26 PM

Corey Oordt

The road to Docker, Django and Amazon ECS, part 4

For part 1

For part 2

For part 3

Putting together a Dockerfile

I couldn't wait any longer, so I wanted to see it running in Docker!

Choosing a Linux distrobution

We want the absolute smallest container we can get to run our project. The container is going to run Linux. We currently have Ubuntu on our servers, but default Ubuntu includes lots of stuff we don't need.

We chose Alpine Linux because it was small and had a large set of packages to install.

Setting up the Dockerfile

We based our Dockerfile on João Ferreira Loff's Alpine Linux Python 2.7 slim image.

FROM alpine:3.5

# Install needed packages. Notes:
#   * dumb-init: a proper init system for containers, to reap zombie children
#   * musl: standard C library
#   * linux-headers: commonly needed, and an unusual package name from Alpine.
#   * build-base: used so we include the basic development packages (gcc)
#   * bash: so we can access /bin/bash
#   * git: to ease up clones of repos
#   * ca-certificates: for SSL verification during Pip and easy_install
#   * python2: the binaries themselves
#   * python2-dev: are used for gevent e.g.
#   * py-setuptools: required only in major version 2, installs easy_install so we can install Pip.
#   * build-base: used so we include the basic development packages (gcc)
#   * linux-headers: commonly needed, and an unusual package name from Alpine.
#   * python-dev: are used for gevent e.g.
#   * postgresql-client: for accessing a PostgreSQL server
#   * postgresql-dev: for building psycopg2
#   * py-lxml: instead of using pip to install lxml, this is faster. Must make sure requirements.txt has correct version
#   * libffi-dev: for compiling Python cffi extension
#   * tiff-dev: For Pillow: TIFF support
#   * jpeg-dev: For Pillow: JPEG support
#   * openjpeg-dev: For Pillow: JPEG 2000 support
#   * libpng-dev: For Pillow: PNG support
#   * zlib-dev: For Pillow:
#   * freetype-dev: For Pillow: TrueType support
#   * lcms2-dev: For Pillow: Little CMS 2 support
#   * libwebp-dev: For Pillow: WebP support
#   * gdal: For some Geo capabilities
#   * geos: For some Geo capabilities
ENV PACKAGES="\
  dumb-init \
  musl \
  linux-headers \
  build-base \
  bash \
  git \
  ca-certificates \
  python2 \
  python2-dev \
  py-setuptools \
  build-base \
  linux-headers \
  python-dev \
  postgresql-client \
  postgresql-dev \
  py-lxml \
  libffi-dev \
  tiff-dev \
  jpeg-dev \
  openjpeg-dev \
  libpng-dev \
  zlib-dev \
  freetype-dev \
  lcms2-dev \
  libwebp-dev \
  gdal \
  geos \
"

RUN echo \
  # replacing default repositories with edge ones
  && echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" > /etc/apk/repositories \
  && echo "http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories \
  && echo "http://dl-cdn.alpinelinux.org/alpine/edge/main" >> /etc/apk/repositories \

  # Add the packages, with a CDN-breakage fallback if needed
  && apk add --no-cache $PACKAGES || \
    (sed -i -e 's/dl-cdn/dl-4/g' /etc/apk/repositories && apk add --no-cache $PACKAGES) \

  # make some useful symlinks that are expected to exist
  && if [[ ! -e /usr/bin/python ]];        then ln -sf /usr/bin/python2.7 /usr/bin/python; fi \
  && if [[ ! -e /usr/bin/python-config ]]; then ln -sf /usr/bin/python2.7-config /usr/bin/python-config; fi \
  && if [[ ! -e /usr/bin/easy_install ]];  then ln -sf /usr/bin/easy_install-2.7 /usr/bin/easy_install; fi \

  # Install and upgrade Pip
  && easy_install pip \
  && pip install --upgrade pip \
  && if [[ ! -e /usr/bin/pip ]]; then ln -sf /usr/bin/pip2.7 /usr/bin/pip; fi \
  && echo

# Chaining the ENV allows for only one layer, instead of one per ENV statement
ENV HOMEDIR=/code \
    LANG=en_US.UTF-8 \
    LC_ALL=en_US.UTF-8 \
    PYTHONUNBUFFERED=1 \
    NEW_RELIC_CONFIG_FILE=$HOMEDIR/newrelic.ini \
    GUNICORNCONF=$HOMEDIR/conf/docker_gunicorn_conf.py \
    GUNICORN_WORKERS=2 \
    GUNICORN_BACKLOG=4096 \
    GUNICORN_BIND=0.0.0.0:8000 \
    GUNICORN_ENABLE_STDIO_INHERITANCE=True \
    DJANGO_SETTINGS_MODULE=settings

WORKDIR $HOMEDIR

# Copying this file over so we can install requirements.txt in one cache-able layer
COPY requirements.txt $HOMEDIR/
RUN pip install --upgrade pip \
  && pip install -r $HOMEDIR/requirements.txt

# Copy the code
COPY . $HOMEDIR

EXPOSE 8000
CMD ["sh", "-c", "$HOMEDIR/docker-entrypoint.sh"]

The first change that we made was to use Alpine Linux version 3.5, which has just been released.

Next we listed all the OS-level packages we'll need in the PACKAGES environment variable.

The next RUN statement sets the package repositories to the edge version, installs the packages in PACKAGES, creates a few convenience symlinks, and installs pip for our Python installs.

We set up all the environment variables next.

After setting the working directory, we copy our requirements.txt file into the container and install all our requirements. We do this step separately so it creates a cached layer that won't change unless the requirements.txt file changes. This saves tons of time if you keep building and re-building the image.

We copy all our code over to the container, tell the container to expose port 8000 and specify the command to run (unless we specify a different command at runtime).

You'll notice that the command looks strange. Because of the way that Docker executes the commands, it can't substitute the environment variable HOMEDIR. So we have to actually prefix our command $HOMEDIR/docker-entrypoint.sh with sh -c.

But there's something missing

You'll notice in this version, there isn't any environment variables for the database, cache, or any other variables we set up earlier. We'll get them in there eventually, but for right now, we want to see if we can build and run this container and have it connect to our local database and cache.

If you build it, it can run

Building the docker image is really easy:

docker build -t ngs:latest .

This tags this built image as ngs:latest, which isn't what we are going to do in production, but it helps when testing everything.

The output looks something like this:

$ docker build -t ngs:latest .
Sending build context to Docker daemon 76.43 MB
Step 1 : FROM alpine:3.5
 ---> 88e169ea8f46
Step 2 : ENV PACKAGES "  dumb-init   musl   linux-headers   build-base   bash   git   ca-certificates   python2   python2-dev   py-setuptools   build-base   linux-headers   python-dev   postgresql-client   postgresql-dev   py-lxml   libffi-dev   tiff-dev   jpeg-dev   openjpeg-dev   libpng-dev   zlib-dev   freetype-dev   lcms2-dev   libwebp-dev   gdal   geos "
 ---> Using cache
 ---> 184f9b7e79f9
Step 3 : RUN echo   && echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" > /etc/apk/repositories   && echo "http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories   && echo "http://dl-cdn.alpinelinux.org/alpine/edge/main" >> /etc/apk/repositories   && apk add --no-cache $PACKAGES ||     (sed -i -e 's/dl-cdn/dl-4/g' /etc/apk/repositories && apk add --no-cache $PACKAGES)   && if [[ ! -e /usr/bin/python ]];        then ln -sf /usr/bin/python2.7 /usr/bin/python; fi   && if [[ ! -e /usr/bin/python-config ]]; then ln -sf /usr/bin/python2.7-config /usr/bin/python-config; fi   && if [[ ! -e /usr/bin/easy_install ]];  then ln -sf /usr/bin/easy_install-2.7 /usr/bin/easy_install; fi   && easy_install pip   && pip install --upgrade pip   && if [[ ! -e /usr/bin/pip ]]; then ln -sf /usr/bin/pip2.7 /usr/bin/pip; fi   && echo
 ---> Using cache
 ---> 514dcc2f010d
Step 4 : ENV HOMEDIR /code LANG en_US.UTF-8 LC_ALL en_US.UTF-8 PYTHONUNBUFFERED 1 NEW_RELIC_CONFIG_FILE $HOMEDIR/newrelic.ini GUNICORNCONF $HOMEDIR/conf/docker_gunicorn_conf.py GUNICORN_WORKERS 2 GUNICORN_BACKLOG 4096 GUNICORN_BIND 0.0.0.0:8000 GUNICORN_ENABLE_STDIO_INHERITANCE True DJANGO_SETTINGS_MODULE settings
 ---> Running in 2d58f77c0a8e
 ---> 1342bb501c0f
Removing intermediate container 2d58f77c0a8e
Step 5 : WORKDIR $HOMEDIR
 ---> Running in a20a2fa64d2e
 ---> df977d30491c
Removing intermediate container a20a2fa64d2e
Step 6 : COPY requirements.txt $HOMEDIR/
 ---> e6ae37797b36
Removing intermediate container 820e3406fb5c
Step 7 : RUN pip install --upgrade pip   && pip install -r $HOMEDIR/requirements.txt
 ---> Running in 4c65be60af03
Requirement already up-to-date: pip in /usr/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg
Collecting beautifulsoup4==4.5.1 (from -r /code/requirements.txt (line 2))
  Downloading beautifulsoup4-4.5.1-py2-none-any.whl (83kB)
Collecting cmsplugin-forms-builder==1.1.1 (from -r /code/requirements.txt (line 3))
...
Installing collected packages: beautifulsoup4, Django, ...
  Running setup.py install for future: started
    Running setup.py install for future: finished with status 'done'
  Installing from a newer Wheel-Version (1.1)
  Running setup.py install for unidecode: started
    Running setup.py install for unidecode: finished with status 'done'
Successfully installed Django-1.8.15 Fabric-1.10.2 ...
 ---> 165f7ae9507e
Removing intermediate container 4c65be60af03
Step 8 : COPY . $HOMEDIR
 ---> 1058d14b462f
Removing intermediate container 55f77f2e60d6
Step 9 : EXPOSE 8000
 ---> Running in 38e8c650a529
 ---> 7c53dcf41f2a
Removing intermediate container 38e8c650a529
Step 10 : CMD sh -c $HOMEDIR/docker-entrypoint.sh
 ---> Running in 1b8781bf6458
 ---> a255a40e30b8
Removing intermediate container 1b8781bf6458
Successfully built a255a40e30b8

I've truncated most of the output from installing the Python dependencies. If I run it again, steps 6 and 7 use the existing cache:

Step 6 : COPY requirements.txt $HOMEDIR/
 ---> Using cache
 ---> e6ae37797b36
Step 7 : RUN pip install --upgrade pip   && pip install -r $HOMEDIR/requirements.txt
 ---> Using cache
 ---> 165f7ae9507e

If I make changes to any other part of our project, steps 1-7 use the cache, and it only has to copy over the new code.

How big is it?

So how big is the container? Running docker images gives us:

REPOSITORY             TAG                 IMAGE ID            CREATED             SIZE
ngs                    latest              a255a40e30b8        11 minutes ago      590.1 MB

So 590.1 MB. What makes up that space? We can take a look at the layers created by our Dockerfile. Running docker history ngs:latest returns:

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
a255a40e30b8        7 minutes ago       /bin/sh -c #(nop)  CMD ["sh" "-c" "$HOMEDIR/d   0 B
7c53dcf41f2a        7 minutes ago       /bin/sh -c #(nop)  EXPOSE 8000/tcp              0 B
1058d14b462f        7 minutes ago       /bin/sh -c #(nop) COPY dir:0da094a2328f4e5bfb   73.69 MB
165f7ae9507e        7 minutes ago       /bin/sh -c pip install --upgrade pip   && pip   227.1 MB
e6ae37797b36        11 minutes ago      /bin/sh -c #(nop) COPY file:25e352c295f212113   3.147 kB
df977d30491c        11 minutes ago      /bin/sh -c #(nop)  WORKDIR /code                0 B
1342bb501c0f        11 minutes ago      /bin/sh -c #(nop)  ENV HOMEDIR=/code LANG=en_   0 B
514dcc2f010d        3 days ago          /bin/sh -c echo   && echo "http://dl-cdn.alpi   285.3 MB
184f9b7e79f9        3 days ago          /bin/sh -c #(nop)  ENV PACKAGES=  dumb-init     0 B
88e169ea8f46        6 days ago          /bin/sh -c #(nop) ADD file:92ab746eb22dd3ed2b   3.984 MB

At the bottom layer is the Alpine Linux 3.5 distro, which is only 3.984 MB. Our OS-level packages take up 285.3 MB. Our Python dependencies take up 227.1 MB. Our code is 73.69 MB.

Make it run! Make it run!

We want this container to connect to resources running on our local computer.

Make PostgreSQL and Redis listen more

My default installation of redis and PostgreSQL only listen for connections on the loopback address. I modified them to listen on every interface.

Now my container will be able to connect to them.

Give the container the address

The container has no idea where it is running. Typically all the connections are made when Docker sets up the containers (and that is what we want, eventually). We need to inform the container on where it is running.

We are going to do this with a temporary script called docker-run.sh

#!/bin/bash
export DOCKERHOST=$(ifconfig | grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}" | grep -v 127.0.0.1 | awk '{ print $2 }' | cut -f2 -d: | head -n1)
docker rm ngs-container
docker run -ti \
    -p 8000:8000 \
    --add-host dockerhost:$DOCKERHOST \
    --name ngs-container \
    -e DATABASE_URL=postgresql://coordt:password@dockerhost:5432/education \
    -e CACHE_URL=rediscache://dockerhost:6379/0?CLIENT_CLASS=site_ext.cacheclient.GracefulClient \
    ngs:latest

The first line sets DOCKERHOST environment variable to the local computer's current IP address.

The second line removes any existing containers named ngs-container. Note: Docker doesn't clean up after itself very well. This is very well known, and there are several different solutions, I'm sure. After doing some Docker building and running, you end up with lots of unused images and containers. This script attempts to remove old containers by naming the container ngs-container each time.

The last line tells docker to run the ngs:latest image, with a pseudo-tty and interactivity (-ti), map container port 8000 to local port 8000 (-p 8000:8000), adds dockerhost to the container's /etc/hosts file with the local computer's current IP address (--add-host dockerhost:$DOCKERHOST), name the container ngs-container (--name ngs-container), and sets the DATABASE_URL and CACHE_URL environment variables.

Now, make docker-run.sh executable with a chmod a+x, and you can run it.

$ ./docker-run.sh
Copying '/code/static/concepts/jquery-textext.js'
Copying '/code/static/autocomplete_light/addanother.js'
...
Post-processed 'js/tiny_mce/plugins/inlinepopups/skins/clearlooks2/img/alert.gif' as 'js/tiny_mce/plugins/inlinepopups/skins/clearlooks2/img/alert.568d4cf84413.gif'
Post-processed 'js/tiny_mce/plugins/inlinepopups/skins/clearlooks2/img/corners.gif' as 'js/tiny_mce/plugins/inlinepopups/skins/clearlooks2/img/corners.55298b5baaec.gif'
...
4256 static files copied to '/code/staticmedia', 4256 post-processed.
Operations to perform:
  Synchronize unmigrated apps: redirects, ...
  Apply all migrations: teachingatlas, ...
Synchronizing apps without migrations:
  Creating tables...
    Running deferred SQL...
  Installing custom SQL...
Running migrations:
  No migrations to apply.

If you remember from a previous post, the docker-entrypoint.sh runs two commands before it starts gunicorn.

The first is collecting (and post-processing) the static media. I've truncated the output for copying and the post-processing of said static media, but you can see that it ran.

The next is a database migration. I've truncated the output somewhat, but can see that nothing was required to migrate.

Now when I try http://localhost:8000, I get a web page! Success!

Next time

In the next installment I'll get the container serving its own static files.

January 05, 2017 01:54 PM

Yoong Kang Lim

Event sourcing in Django

Django comes with "batteries included" to make CRUD (create, read, update, delete) operations easy. It's nice that the CR part (create and read) of CRUD is so easy, but have you ever paused to think about the UD part (update and delete)?

Let's look at delete. All you need to do is this:

ReallyImportantModel.objects.get(id=32).delete()  # gone from the database forever

Just one line, and your data is gone forever. It can be done accidentally. Or you can be do it deliberately, only to later realise that your old data is valuable too.

Now what about updating?

Updating is deleting in disguise.

When you update, you're deleting the old data and replacing it with something new. It's still deletion.

important = ReallyImportantModel.object.get(id=32)
important.update(data={'new_data': 'This is new data'})  # OLD DATA GONE FOREVER

Okay, but why do we care?

Let's say we want to know the state of ReallyImportantModel 6 months ago. Oh that's right, you've deleted it, so you can't get it back.

Well, that's not exactly true -- you can recreate your data from backups (if you don't backup your database, stop reading right now and fix that immediately). But that's clumsy.

So by only storing the current state of the object, you lose all the contextual information on how the object arrived at this current state. Not only that, you make it difficult to make projections about the future.

Event sourcing ¹ can help with that.

Event sourcing

The basic concept of event sourcing is this:

Instead of just storing the current state, we also store the events that lead up to the current state
Events are replayable. We can travel back in time to any point by replaying every event up to that point in time
That also means we can recover the current state just by replaying every event, even if the current state was accidentally deleted
Events are append-only.

To gain an intuition, let's look at an event sourcing system you're familiar with: your bank account.

Your "state" is your account balance, while your "events" are your transactions (deposit, withdrawal, etc.).

Can you imagine a bank account that only shows you the current balance?

That is clearly unacceptable ("Why do I only have $50? Where did my money go? If only I could see the the history."). So we always store the history of transfers as the source of truth.

Implementing event sourcing in Django

Let's look at a few ways to do this in Django.

Ad-hoc models

If you have a one or two important models, you probably don't need a generalizable event sourcing solution that applies to all models.

You could do it on an ad-hoc basis like this, if you can have a relationship that makes sense:

# in an app called 'account'
from django.db import models
from django.conf import settings


class Account(models.Model):
    """Bank account"""
    balance = models.DecimalField(max_digits=19, decimal_places=6)
    owner = models.ForeignKey(settings.AUTH_USER_MODEL, related_name='account')


class Transfer(models.Model):
    """
    Represents a transfer in or out of an account. A positive amount indicates
    that it is a transfer into the account, whereas a negative amount indicates
    that it is a transfer out of the account.
    """
    account = models.ForeignKey('account.Account', on_delete=models.PROTECT, 
                                related_name='transfers')
    amount = models.DecimalField(max_digits=19, decimal_places=6)
    date = models.DateTimeField()

In this case your "state" is in your Account model, whereas your Transfer model contains the "events".

Having Transfer objects makes it trivial to recreate any account.

Using an Event Store

You could also use a single Event model to store every possible event in any model. A nice way to do this is to encode the changes in a JSON field.

This example uses Postgres:

from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.contrib.postgres.fields import JSONField
from django.db import models


class Event(models.Model):
    """Event table that stores all model changes"""
    content_type = models.ForeignKey(ContentType, on_delete=models.PROTECT)
    object_id = models.PositiveIntegerField()
    time_created = models.DateTimeField()
    content_object = GenericForeignKey('content_type', 'object_id')
    body = JSONField()

You can then add methods to any model that mutates the state:

class Account(models.Model):
    balance = models.DecimalField(max_digits=19, decimal_places=6, default=0
    owner = models.ForeignKey(settings.AUTH_USER_MODEL, related_name='account')

    def make_deposit(self, amount):
        """Deposit money into account"""
        Event.objects.create(
            content_object=self,
            time_created=timezone.now(),
            body=json.dumps({
                'type': 'made_deposit',
                'amount': amount,
            })
        )
        self.balance += amount
        self.save()

    def make_withdrawal(self, amount):
        """Withdraw money from account"""
        Event.objects.create(
            content_object=self,
            time_created=timezone.now(),
            body=json.dumps({
                'type': 'made_withdrawal',
                'amount': -amount,  # withdraw = negative amount
            })
        )
        self.balance -= amount
        self.save()

    @classmethod
    def create_account(cls, owner):
        """Create an account"""
        account = cls.objects.create(owner=owner, balance=0)
        Event.objects.create(
            content_object=account,
            time_created=timezone.now(),
            body=json.dumps({
                'type': 'created_account',
                'id': account.id,
                'owner_id': owner.id
            })
        )
        return account

So now you can do this:

account = Account.create_account(owner=User.objects.first())
account.make_deposit(decimal.Decimal(50.0))
account.make_deposit(decimal.Decimal(125.0))
account.make_withdrawal(decimal.Decimal(75.0))

events = Event.objects.filter(
    content_type=ContentType.objects.get_for_model(account), 
    object_id=account.id
)

for event in events:
    print(event.body)

Which should give you this:

{"type": "created_account", "id": 2, "owner_id": 1}
{"type": "made_deposit", "amount": 50.0}
{"type": "made_deposit", "amount": 125.0}
{"type": "made_withdrawal", "amount": -75}

Again, this makes it trivial to write any utility methods to recreate any instance of Account, even if you accidentally dropped the whole accounts table.

Snapshotting

There will come a time when you have too many events to efficiently replay the entire history. In this case, a good optimisation step would be snapshots taken at various points in history. For example, in our accounting example one could save snapshots of the account in an AccountBalance model, which is a snapshot of the account's state at a point in time.

You could do this via a scheduled task. Celery ² is a good option.

Summary

Use event sourcing to maintain an append-only list of events for your critical data. This effectively allows you to travel in time to any point in history to see the state of your data at that time.

UPDATE: If you want to see an example repo, feel free to take a look here: https://github.com/yoongkang/event_sourcing_example

Martin Fowler wrote a detailed description of event sourcing in his website here: http://martinfowler.com/eaaDev/EventSourcing.html ↩
Celery project. http://www.celeryproject.org/ ↩

January 05, 2017 12:17 PM

Python Software Foundation

"Weapons of Math Destruction" by Cathy O'Neil

In a 1947 lecture on computing machinery, Alan Turing made a prediction: "The new machines will in no way replace thought, but rather they will increase the need for it."

Someday, he said, machines would think for themselves, but the computers of the near future would require human supervision to prevent malfunctions:

"The intention in constructing these machines in the first instance is to treat them as slaves, giving them only jobs which have been thought out in detail, jobs such that the user of the machine fully understands in principle what is going on all the time." ¹

It is unclear now whether machines remain slaves, or if they are beginning to be masters. Machine-learning algorithms pervasively control the lives of Americans. We do not fully understand what they do, and when they malfunction they harm us, by reinforcing the unjust systems we already have. Usually unintentionally, they can make the lives of poor people and people of color worse.

In "Weapons of Math Destruction", Cathy O'Neil identifies such an algorithm as a "WMD" if it satisfies three criteria: it makes decisions of consequence for a large number of people, it is opaque and unaccountable, and it is destructive. I interviewed O'Neil to learn what data scientists should do to disarm these weapons.

Automated Injustice

Recidivism risk models are a striking example of algorithms that reinforce injustice. These algorithms purport to predict how likely a convict is to commit another crime in the next few years. The model described in O'Neil's book, called LSI-R, assesses offenders with 54 questions, then produces a risk score based on correlations between each offender's characteristics and the characteristics of recidivists and non-recidivists in a sample population of offenders.

Some of LSI-R's factors measure the offender's past behavior: Has she ever been expelled from school, or violated parole? But most factors probably aren't under the individual's control: Does she live in a high-crime neighborhood? Is she poor? And many factors are not under her control at all: Has a family member been convicted of any crimes? Did her parents raise her with a "rewarding" parenting style?

Studies of LSI-R show it gives worse scores to poor black people. Some of its questions directly measure poverty, and others (such as frequently changing residence) are proxies for poverty. LSI-R does not know the offender's race. It would be illegal to ask, but, O'Neil writes, "with the wealth of detail each prisoner provides, that single illegal question is almost superfluous." For example, it asks the offender's age when he was first involved with the police. O'Neil cites a 2013 New York Civil Liberties Union study that young black and Hispanic men were ten times as likely to be stopped by the New York City police, even though only a tiny fraction were doing anything criminal.

So far, the LSI-R does not automatically become destructive. If it is accurate, and used for benign choices like spending more time treating and counselling offenders with high risk scores, it could do some good. But in many states, judges use the LSI-R and models like it to decide how long the offender's sentence should be. This is not LSI-R's intended use, and it is certainly not accurate enough for it: a study this year found that LSI-R misclassified 41% of offenders. ²

Success, According to Whom?

O'Neil told me that whether an algorithm becomes a WMD depends on who defines success, and according to whom. "Over and over again, people act as if there's only one set of stakeholders."

When a recidivism risk model is used to sentence someone to a longer prison term, the sole stakeholder respected is law enforcement. "Law enforcement cares more about true positives, correctly identifying someone who will reoffend and putting them in jail for longer to keep them from committing another crime." But our society has a powerful interest in preventing false positives. Indeed, we were founded on a constitution that considered a false positive—that is, being punished for a crime you did not commit—to be extremely costly. Principles including the presumption of innocence, the requirement that guilt is proven beyond reasonable doubt, and so on, express our desire to avoid unjust punishment, even at the cost of some criminals being punished too little or going free.

However, this interest is ignored when an offender is punished for a bad LSI-R score. His total sentence accounts not only for the crime he committed, but also for future crimes he is thought likely to commit. Furthermore, he is punished for who he is: Being related to a criminal or being raised badly are circumstances of birth, but for many people facing sentencing, such circumstances are used to add years to their time behind bars.

Statistically Unsound

Cathy O'Neil says weapons of math destruction are usually caused by two failures. The first is when only one stakeholder's interests define success. LSI-R is an example of this. The other is a lack of actual science in data science. For these algorithms, she told me, "We actually don't have reasonable ways of checking to see whether something is working or not."

A New York City public school program begun in 2007 assessed teachers with a "value added model", which estimated how much a teacher affected each student's progress on standardized tests. To begin, the model forecast students' progress, given their neighborhood, family income, previous achievement, and so on. At the end of the year their actual progress was compared to the forecast, and the difference was attributed to the teacher's effectiveness. O'Neil tells the story of Tim Clifford, a public school teacher who scored only 6 out of 100 the first year he was assessed, then 96 out of 100 the next year. O'Neil writes, "Attempting to score a teacher's effectiveness by analyzing the test results of only twenty-five or thirty students is statistically unsound, even laughable." One analysis of the assessment showed that a quarter of teachers' scores swung by 40 points in a year. Another showed that, with such small samples, the margin of error made half of all teachers statistically indistinguishable.

Nevertheless, the score might determine if the teacher was given a bonus, or fired. Although its decision was probabilistic, appealing it required conclusive evidence. O'Neil points out that time and again, "the human victims of WMDs are held to a higher standard of evidence than the algorithms themselves." The model is math so it is presumed correct, and anyone who objects to its scores is suspect.

New York Governor Andrew Cuomo put a moratorium on these teacher evaluations in 2015. We are starting to see that some questions require too subtle an intelligence for our current algorithms to answer accurately. As Alan Turing said, "If a machine is expected to be infallible, it cannot also be intelligent."

Responsible Data Science

I asked Cathy O'Neil about the responsibilities of data scientists, both in their daily work and as reformers of their profession. Regarding daily work, O'Neil drew a sharp line: "I don't want data scientists to be de facto policy makers." Rather, their job is to explain to policy makers the moral tradeoffs of their choices. The same as any programmer gathers requirements before coding a solution, data scientists should gather requirements regarding the relative cost of different kinds of errors. Machine learning algorithms are always imperfect, but they can be tweaked for either more false positives or more false negatives. When the stakes are high, the choice between the two is a moral one. Data scientists must pose these questions frankly to policy makers, says O'Neil, and "translate moral decisions into code."

Tradeoffs in the private sector often pit corporate interests against human ones. This is especially dangerous to the poor because, as O'Neil writes, "The privileged are processed more by people, the masses by machines." She told me that when the boss asks for an algorithm that optimizes for profit, it is the data scientist's duty to mention that the algorithm should also consider fairness.

"Weapons of Math Destruction" tells us how to recognize a WMD once it is built. But how can we predict whether an algorithm will become a WMD? O'Neil told me, "The biggest warning sign is if you're choosing winners and losers, and if it's a big deal for losers to lose. If it's an important decision and it's a secret formula, then that's a set-up for a weapon of math destruction. The only other ingredient you need in that setup is actually making it destructive."

Reform

Cathy O'Neil says the top priority, for data scientists who want to disarm WMDs, is to develop tools for analyzing them. For example, any EU citizen harmed by an algorithmic decision may soon have the legal right to an explanation, but so far we lack the tools to provide one. We also need tools to measure disparate impact and unfairness. O'Neil says, "We need tools to decide whether an algorithm is being racist."

New data scientists should enter the field with better training in ethics. Curricula usually ignore questions of justice, as if the job of the data scientist were purely technical. Data-science contests like Kaggle also encourage this view, says O'Neil. "Kaggle has defined the success and the penalty function. The hard part of data science is everything that happens before Kaggle." O'Neil wants more case studies from the field, anonymized so students can learn from them how data science is really practiced. It would be an opportunity to ask: When an algorithm makes a mistake, who gets hurt?

If data scientists take responsibility for the effects of their work, says O'Neil, they will become activists. "I'm hoping the book, at the very least, gets people to acknowledge the power that they're wielding," she says, "and how it could be used for good or bad. The very first thing we have to realize is that well-intentioned people can make horrible mistakes."

1. Quoted in "Alan Turing: The Enigma", by Andrew Hodges. Princeton University Press. ↩

2. See also ProPublica's analysis of bias in a similar recidivism model, COMPAS. ↩

January 05, 2017 11:34 AM

Sylvain Hellegouarch

ws4py is eager for a new maintainer

Years ago, I got really interested in the WebSocket protocol that eventually landed as RFC 6455. It was the first time I would spend so much time trying to participate elaborating a protocol. Obviously, my role was totally minor but the whole experience was both frustrating and exhilarating at the same time.

I thought the protocol was small enough that it would be a good scope for a Python project: ws4py. Aside from implementing the core protocol, what I was interested in wa two folds. First, relying on Python generators all the way down as a control flow mechanism. Second, I wanted to decouple the socket layer from the interface so I could more easily write tests through fake WebSocket objects. Indeed, as any network protocol, the devil is in the details and WebSocket is no different. There are a few corner cases that would have been hard to test with a socket but were trivial with an interface.

Did I succeed? Partly. In insight, my design was not perfect and I made a couple of mistakes:

I relied too much on an OOP design for high-level interface. Quickly I realised I could, and should, have used a more functional approach. After all, WebSockets are merely event handlers and a functional approach would have made more sense. With that said, Python is not a functional language, it has a few features but the ecosystem is not entirely driven that way so, at the time, that might have made the library adoption more difficult.
I sort of wrote my internal event loop abstraction on top of select, epoll… One goal I had set to myself is not to rely on any external dependency for the library. At least, I wanted to make sure the library could be used as-is for quick and dirty tests. That’s a bad idea for things as complex as proper event looping. Eventually, I provided support for gevent and asyncio (called tulip back then) though. Still, there is a chunk of the code that could be made simpler and more robust if we changed that.
Though I did decouple the socket from the interface and provided a nicer way of testing the protocol, the socket interface still leaks here and there making the code not as tight as it could be.

I’ve started that project years ago and I haven’t really paid attention to it for the last two years. Some folks, using the library, have been getting restless and no doubt frustrated by my lack of commitment to it. I want to be honest, I’ve lost interest in that project, I’ve moved to other puzzles that have picked my brain and I don’t have the energy for ws4py any longer.

Is the project still relevant anyway? Well, GitHub tells me it is starred by almost 800 individuals. That’s not massive but it’s decent for sure, the highest rank for any my projects. Also, it’s the only websocket library that runs with CherryPy.

Is WebSocket relevant still? That’s not for me to say. Some people claim it’s dying a slow death due to HTTP/2. Honestly, I have no idea. WebSocket, much like the Atom Publishing Protocol (another protocol I cared for), didn’t do as well as their authors may have expected initially.

Anyhow, I think I should be passing the relay to someone else who is motivated to take the project on. I’m not sure how this will happen but I would like to think we can be as transparent as we can about this. Please use the comments below or the mailing-list to discuss your interest.

If by the end of February, no one has showed any interest, I will deprecate the project officially so people can gradually move on. The project will stay on GitHub but it’ll be clear that no further changes or releases will be done.

It’s sad to let go a project you cared for but it’s only fair to the community to be transparent when you’ve lost the energy for it. Long live ws4py!

January 05, 2017 09:32 AM

Vasudev Ram

Give your Python function a {web,CLI} hug!

By Vasudev Ram

I came across this interesting Python framework called hug recently:

www.hug.rest

Hug is interesting because it allows you to create a function in Python and then expose it via both the web and the command-line. It also does some data type validation using Python 3's annotations (not shown in my example, but see the hug quickstart below). Hug is for Python 3 only, and builds upon on the Falcon web framework (which is "a low-level high performance framework" for, among other things, "building other frameworks" :).

Here is the hug quickstart.

The hug site says it is "compiled with Cython" for better performance. It makes some claims about being one of the faster Python frameworks. Haven't checked that out.

Here is an HN thread about hug from about a year ago, with some interesting comments, in which the author of hug also participated, replying to questions by readers, explaining some of his claims, etc. Some benchmark results for hug vs. other tools are also linked to in that thread:

I tried out some of the features of hug, using Python 3.5.2, with a small program I wrote.

Below is the test program I wrote, hug_pdp.py. The pdp in the filename stands for psutil disk partitions, because it uses the psutil disk_partitions() function that I blogged about here recently:

Using psutil to get disk partition information with Python

Here is hug_pdp.py:

"""
hug_pdp.py
Use hug with psutil to show disk partition info 
via Python, CLI or Web interfaces.
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
"""

import sys
import psutil
import hug

def get_disk_partition_data():
    dps = psutil.disk_partitions()
    fmt_str = "{:<8} {:<7} {:<7}"

    result = {}
    result['header'] = fmt_str.format("Drive", "Type", "Opts")
    result['detail'] = {}
    for i in (0, 2):
        dp = dps[i]
        result['detail'][str(i)] = fmt_str.format(dp.device, dp.fstype, dp.opts)
    return result

@hug.cli()
@hug.get(examples='drives=0,1')
@hug.local()
def pdp():
    """Get disk partition data"""
    result = get_disk_partition_data()
    return result

@hug.cli()
@hug.get(examples='')
@hug.local()
def pyver():
    """Get Python version"""
    pyver = sys.version[:6]
    return pyver

if __name__ == '__main__':
    pdp.interface.cli()

Note the use of hug decorators in the code to enable different kinds of user interfaces and HTTP methods.

Here are some different ways of running this hug-enabled program, with their outputs:

As a regular Python command-line program, using the python command:

$ python  hug_pdp.py
{'detail': {'0': 'C:\\      NTFS    rw,fixed', '2': 'E:\\      CDFS    ro,cdrom'
}, 'header': 'Drive    Type    Opts   '}

As a command-line program, using the hug command:

$ hug -f hug_pdp.py -c pdp
{'detail': {'2': 'E:\\      CDFS    ro,cdrom', '0': 'C:\\      NTFS    rw,fixed'}, 
'header': 'Drive    Type    Opts   '}

You can see that this command gives the same output as the previous one.
But you can also run the above command with the "-c pyver" argument instead of "-c pdp", giving:

$ hug -f hug_pdp.py -c pyver
3.5.2

(I added the pyver() function to the program later, after the initial runs with just the pdp() function, to figure out how using the hug command to run the program was different from using the python command to run it. The answer can be seen from the above output, though there is another difference too, shown below (the web interface). Next, I ran it this way:

$ hug -f hug_pdp.py

which started a web server (running on port 8000), giving this output on the console:

/#######################################################################\
          `.----``..-------..``.----.
         :/:::::--:---------:--::::://.
        .+::::----##/-/oo+:-##----:::://
        `//::-------/oosoo-------::://.       ##    ##  ##    ##    #####
          .-:------./++o/o-.------::-`   ```  ##    ##  ##    ##  ##
             `----.-./+o+:..----.     `.:///. ########  ##    ## ##
   ```        `----.-::::::------  `.-:::://. ##    ##  ##    ## ##   ####
  ://::--.``` -:``...-----...` `:--::::::-.`  ##    ##  ##   ##   ##    ##
  :/:::::::::-:-     `````      .:::::-.`     ##    ##    ####     ######
   ``.--:::::::.                .:::.`
         ``..::.                .::         EMBRACE THE APIs OF THE FUTURE
             ::-                .:-
             -::`               ::-                   VERSION 2.2.0
             `::-              -::`
              -::-`           -::-
\########################################################################/

 Copyright (C) 2016 Timothy Edmund Crosley
 Under the MIT License


Serving on port 8000...

Then I went to this URL in my browser:

http://localhost:8000/pdp

which gave me this browser output:

{"detail": {"0": "C:\\      NTFS    rw,fixed", "2": "E:\\      CDFS    ro,cdrom"}, 
"header": "Drive    Type    Opts   "}

which is basically the same as the earlier command-line interface output I got.
Next I went to this URL:

http://localhost:8000/pyver

which gave me this:

"3.5.2 "

which again is the same as the earlier corresponding command-line output of the hug command.

Of course, the output from both the web and CLI interfaces is either JSON or a dict, so in a real life app, we would have to get that output and use it in some way, such as (process it further before we) format it better for human consumption. If using a JavaScript front-end, it can easily be done; if using the code as is with the command-line mode, we need to figure out a way to do it. The hug module may have some support for that.

What is also interesting is that when I run it this way:

http://localhost:8000/

I get this browser output:

{
    "404": "The API call you tried to make was not defined. Here's a definition 
of the API to help you get going :)",
    "documentation": {
        "overview": "\nhug_pdp.py\nUse hug with psutil to show disk partition 
info \nvia Python, CLI or Web interfaces.\nCopyright 2017 Vasudev Ram\nWeb site: 
https://vasudevram.github.io\nBlog: http://jugad2.blogspot.com\nProduct store: 
https://gumroad.com/vasudevram\n",
        "handlers": {
            "/pdp": {
                "GET": {
                    "usage": "Get disk partition data",
                    "examples": [
                        "http://localhost:8000/pdp?drives=0,1"
                    ],
                    "outputs": {
                        "format": "JSON (Javascript Serialized Object Notation)",
                        "content_type": "application/json"
                    }
                }
            },
            "/pyver": {
                "GET": {
                    "usage": "Get Python version",
                    "examples": [
                        "http://localhost:8000/pyver"
                    ],
                    "outputs": {
                        "format": "JSON (Javascript Serialized Object Notation)",
                        "content_type": "application/json"
                    }
                }
            }
        }
    }
}

which shows that trying to access an unsupported route, gives as output, this:

an overview, supported URLs/routes, HTTP methods, and documentation about how to use it and the output formats - almost none of which code was written for, mind.

Go give your Python code a hug!

- Vasudev Ram - Online Python training and consulting