Planet Python

Last update: May 05, 2016 07:48 AM

May 05, 2016

PythonClub - A Brazilian collaborative blog about Python

GitHub Pages com Pelican e Travis-CI

Publicado originalmente em: df.python.org.br/blog/github-pages-com-pelican-e-travis-ci

Olá pessoal!

Estou fazendo esta postagem para ajudar quem quer criar seu site no GitHub Pages usando Pelican para a criação das páginas e o Travis-CI para automatizar a tarefa de geração e publicação.

Este guia assume que o leitor possua conta no GitHub e no Travis-CI e tenha familiaridade com o ambiente python. A versão do pelican utilizada ao elaborar esta publicação foi a 3.6.

O GitHub Pages

O GitHub Pages é uma funcionalidade gratuita do GitHub para hospedar conteúdo estático (html, css, js e imagens) e publicar através de um sub-domínio de github.io ou até mesmo de um domínio customizado. É baseado em seu funcionamento que iremos guiar nossos próximos passos.

Resumidamente existem duas formas de se criar uma página pelo gh pages:

1 - Página de usuário/organização

Para este tipo de página crie um repositório com o nome usuario.github.io, onde usuario é o nome de usuário ou organização da conta em que o repositório será criado:

O conteúdo a ser publicado deve ser colocado na branch master e os arquivos do pelican na branch pelican.

2 - Página de projeto

Para este tipo de página crie um repositório com o nome meuprojeto, onde meuprojeto é o nome desejado para o projeto que será publicado em usuario.github.io:

O conteúdo a ser publicado deve ser colocado na branch gh-pages e os arquivos do pelican na branch pelican.

Para mais informações acesse o site oficial do GitHub Pages.

Pelican

O pelican é um gerador de site estático otimizado por padrão para criação de blogs. Utilizaremos aqui, para fins de demonstração, o modelo padrão do de blog seguindo o caminho de criação de página de usuário/organização, qualquer diferença do caminho de página de projeto será descrita quando necessário.

Para instalar o pelican basta rodar o comando:

$ pip install pelican==3.6

Para criar um projeto faça:

$ mkdir humrochagf.github.io
$ cd humrochagf.github.io
$ pelican-quickstart
Welcome to pelican-quickstart v3.6.3.

This script will help you create a new Pelican-based website.

Please answer the following questions so this script can generate the files
needed by Pelican.


> Where do you want to create your new web site? [.]
> What will be the title of this web site? Meu Blog
> Who will be the author of this web site? Humberto Rocha
> What will be the default language of this web site? [en] pt
> Do you want to specify a URL prefix? e.g., http://example.com   (Y/n) n
> Do you want to enable article pagination? (Y/n) y
> How many articles per page do you want? [10]
> What is your time zone? [Europe/Paris] America/Sao_Paulo
> Do you want to generate a Fabfile/Makefile to automate generation and publishing? (Y/n) y
> Do you want an auto-reload & simpleHTTP script to assist with theme and site development? (Y/n) y
> Do you want to upload your website using FTP? (y/N) n
> Do you want to upload your website using SSH? (y/N) n
> Do you want to upload your website using Dropbox? (y/N) n
> Do you want to upload your website using S3? (y/N) n
> Do you want to upload your website using Rackspace Cloud Files? (y/N) n
> Do you want to upload your website using GitHub Pages? (y/N) y
> Is this your personal page (username.github.io)? (y/N) y
Done. Your new project is available at /caminho/para/humrochagf.github.io

Inicialize um repositório neste diretório e suba os dados para a branch pelican:

$ git init
$ git remote add origin [email protected]:humrochagf/humrochagf.github.io.git
$ git checkout -b pelican
$ git add .
$ git commit -m 'iniciando branch pelican'
$ git push origin pelican

Para publicar o conteúdo na branch master é necessário o módulo ghp-import:

$ pip install ghp-import
$ echo 'pelican==3.6\nghp-import' > requirements.txt
$ git add requirements.txt
$ git commit -m 'adicionando requirements'
$ git push origin pelican

Publicando o blog:

$ make github

Para publicar no caso da página de projeto altere o conteúdo da variável GITHUB_PAGES_BRANCH do makefile de master para gh-pages.

Agora que o nosso blog está rodando no gh pages vamos automatizar a tarefa de geração das páginas para poder alterar o conteúdo do blog e fazer novas postagens sem precisar estar um uma máquina com o ambiente do pelican configurado.

Travis-CI

O Travis-CI é uma plataforma de Integração Contínua que monta e testa projetos hospedados no github e será nossa ferramenta para automatizar a montagem das páginas do blog.

A Primeira coisa a ser feita é ir ao Travis-CI e habilitar seu repositório.

Em seguida vá nas configurações do repositório no travis e desabilite a opção Build pull requests para seu blog não ser atualizado quando alguém abrir um pull request e habilite o Build only if .travis.yml is present para que somente a branch que possuir o arquivo .travis.yml gerar atualização no blog.

O próximo passo é criar uma Deploy Key para que o travis possa publicar conteúdo no github. Para isso gere uma chave ssh na raiz do repositório local:

$ ssh-keygen -f publish-key
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in publish-key.
Your public key has been saved in publish-key.pub.

Criada a chave vamos cifrar usando a ferramenta Travis-CLI (certifique-se de que esteja instalada em sua máquina) para poder publicar em nosso repositório sem expor o conteúdo da chave privada:

$ travis encrypt-file publish-key
Detected repository as humrochagf/humrochagf.github.io, is this correct? |yes| yes
encrypting publish-key for humrochagf/humrochagf.github.io
storing result as publish-key.enc
storing secure env variables for decryption

Please add the following to your build script (before_install stage in your .travis.yml, for instance):

    openssl aes-256-cbc -K $encrypted_591fe46d4973_key -iv $encrypted_591fe46d4973_iv -in publish-key.enc -out publish-key -d

Pro Tip: You can add it automatically by running with --add.

Make sure to add publish-key.enc to the git repository.
Make sure not to add publish-key to the git repository.
Commit all changes to your .travis.yml.

Como dito no resultado do comando podemos adicionar a opção --add para já adicionar as informações no .travis.yml, porém, para evitar de sobrescrever algum comando que venha existir no seu arquivo é recomendado editar manualmente.

Em nosso caso iremos criar o arquivo:

$ touch .travis.yml

E adicionar o seguinte conteúdo:

sudo: false
branches:
  only:
  - pelican
language: python
before_install:
# troque a linha abaixo pelo resultado do comando:
# travis encrypt-file publish-key
# porém mantenha o final:
# -out ~/.ssh/publish-key -d
- openssl aes-256-cbc -K $encrypted_591fe46d4973_key -iv $encrypted_591fe46d4973_iv -in publish-key.enc -out ~/.ssh/publish-key -d
- chmod u=rw,og= ~/.ssh/publish-key
- echo "Host github.com" >> ~/.ssh/config
- echo "  IdentityFile ~/.ssh/publish-key" >> ~/.ssh/config
# substitua [email protected]:humrochagf/humrochagf.github.io.git
# pelo endereço de acesso ssh do seu repositório
- git remote set-url origin [email protected]:humrochagf/humrochagf.github.io.git
# Caso esteja montando a página de projeto troque master:master
# por gh-pages:gh-pages
- git fetch origin -f master:master
install:
- pip install --upgrade pip
- pip install -r requirements.txt
script:
- make github

Removemos em seguida a chave privada não cifrada para não correr o risco de publicar no repositório:

$ rm publish-key

ATENÇÃO: Em hipótese alguma adicione o arquivo publish-key em seu repositório, pois ele contém a chave privada não cifrada que tem poder de commit em seu repositório, e não deve ser publicada. Adicione somente o arquivo publish-key.enc. Se você adicionou por engano refaça os passos de geração da chave e cifração para gerar uma chave nova.

Agora adicionaremos os arquivos no repositório:

$ git add .travis.yml publish-key.enc
$ git commit -m 'adicionando arquivos do travis'
$ git push origin pelican

Para liberar o acesso do travis adicionaremos a deploy key no github com o conteúdo da chave pública publish-key.pub:

Pronto, agora podemos publicar conteúdo em nosso blog sem a necessidade de ter o pelican instalado na máquina:

Que o travis irá publicar para você:

Caso você tenha animado de criar seu blog pessoal e quer saber mais sobre pelican você pode acompanhar a série do Mind Bending sobre o assunto.

May 05, 2016 12:46 AM

May 04, 2016

Mike Driscoll

Python 201 Book Writing Update: Part 1 is Ready

I’ve been busily working on my second book, Python 201: Intermediate Python. In part one of the book, there are 10 chapters. I recently finished up the last chapter for that part of the book. While I have some tweaks I want to do to a couple of the chapters in this part of the book, I’m going to leave them alone for now so I can get part 2 done. Then I’ll be going back to part 1 to do some updates. This also allows the early adopters time to read the first chapters and send me messages about typos or bugs.

For those of you who didn’t get in on the Kickstarter for the book, the first 10 chapters are as follows:

Chapter 1 – The argparse module
Chapter 2 – The collections module
Chapter 3 – The contextlib module (Context Managers)
Chapter 4 – The functools module (Function overloading, caching, etc)
Chapter 5 – All about imports
Chapter 6 – The importlib module
Chapter 7 – Iterators and Generators
Chapter 8 – The itertools module
Chapter 9 – The re module (An Intro to Regex in Python)
Chapter 10 – The typing module (Type Hinting)

There are currently 71 pages in the book so far in my Gumroad edition and over 80 pages in the Leanpub version. Leanpub is generated differently which means they use different fonts and font sizes, which is why that version has more pages. Regardless, the book is coming along well and is still on track for a September, 2016 release!

May 04, 2016 09:04 PM

PyCharm

Debugger Interview with PyDev and PyCharm

PyCharm’s visual debugger is one its most powerful and useful features. The debugger got a big speedup in the recent PyCharm, and has an interesting backstory: JetBrains collaborated with PyDev, the popular Python plugin for Eclipse, and funded the work around performance improvements for the common debugger’s backend.

To tell us more about the improvements as well as cross-project cooperation, we interviewed the principles: Fabio Zadrozny, creator of PyDev, and Dmitry Trofimov, Team Lead for PyCharm.

Let’s jump right into it. Tell us about the speedups in the latest pydevd release.

FZ: The performance has always been a major focus of the debugger. I think that’s actually a requisite for a pure-python debugger.

To give an example here: Python debuggers work through the Python tracing facility (i.e.: sys.settrace), by handling tracing calls and deciding what to do at each call.

Usually a debugger would be called at each step to decide what to do, but pydevd is actually able to completely disable the tracing for most contexts (any context that doesn’t have a breakpoint inside it should run untraced) and re-evaluate its assumptions if a breakpoint is added.

Now, even having the performance as a major focus, the latest release was still able to give really nice speedups (the plain Python version had a 40% speed improvement overall while the Cython version had a 140% increase).

I must say that at that point, there weren’t any low-hanging fruits for speeding up the debugger, so, the improvement actually came from many small improvements and Cython has shown that it can give a pretty nice improvement given just a few hints to it.

DT: The performance of the debugger was one of the top voted requests in PyCharm tracker. The latest release addresses this by implementing some parts of the debugger in Cython, which leads to huge performance improvements on all type of projects.

Was the Cython decision an easy one?

FZ: Actually, yes, it was a pretty straightforward decision…

The main selling point is that the Cython version is very similar to the Python version, so, the same codebase is used for Cython and plain Python code — the Cython version is generated from the plain Python version by preprocessing it with a mechanism analogous to #IFDEF statements in C/C++.

Also, this means that with the same codebase it’s possible to support CPython (which can have the Cython speedups) while also supporting Jython, PyPy, IronPython, etc. I even saw someone post about the debugger being used in a javascript implementation of Python.

DT: The idea was to make the debugger faster by rewriting the bottlenecks in C, but at the same time optional to have any compiled binaries, so that pure Python version would still work. Also, it was desirable to have as little code duplication as possible. Cython let us do all that perfectly, so it was a natural decision.

Let’s take a step back and discuss the 2014 decision to merge efforts. How did this conversation get started?

FZ: I was doing a crowdfunding for PyDev which had a profiler as one of its main points, which was something that PyCharm wanted to add too. Although the initial proposal didn’t come through, we started talking about what we already had in common, which was the debugger backend and how each version had different features at that point. I think PyCharm had just backported some of the changes I had done in the latest PyDev version at that time to its fork, and we agreed it would be really nice if we could actually work in the same codebase.

DT: We have used the fork of Pydev debugger since the beginning of the PyCharm and occasionally I would check what was going in Pydev branch to backport features and fixes from there to PyCharm. Meanwhile, Fabio does the same, taking the latest fixes from PyCharm branch. As time passed and branches diverged, it was getting more and more difficult to compare the branches and backport fixes from one another.

After one of the tough merges, I thought, maybe we’d better create a common project that would be used in both IDEs. So I decided to contact Fabio and was very happy when he supported the idea.

Did the merging/un-forking go as you planned, or were there technical or project challenges?

FZ: The merging did go as planned…

The main challenge was the different feature set each version had back then. For instance, PyDev had some improvements on dealing with exceptions, finding referrers, stackless and debugger reload, whereas PyCharm had things such as the multiprocessing, gevent and Django templates (and the final version had to support everything from both sides).

The major pain point on the whole merging was actually on the gevent support, because the debugger really needs threads to work and gevent has an option for monkey-patching the threading library, which made the debugger go haywire.

DT: The main challenge was to test all the fixes done for the PyCharm fork of the debugger for the possible regressions in the merged version. We had a set of tests for debugger, but the coverage, of course, wasn’t 100%. So we made the list of all debugger issues fixed for the last 3 years (around 150 issues,) and just tested them. That helped us to ensure that we won’t have regressions in a release.

Fabio, how did it go on your end, having JetBrains sponsor some of your work? Any pushback in your community?

FZ: I must say I didn’t really have any pushback from the community. I’ve always been pretty open-minded about the code on PyDev (which was being used early on in PyCharm for the debugger) and I believe IDEs are a really personal choice. So I’m happy that the code I did can reach more people, even if not directly inside PyDev. Also, I think the community saw it as a nice thing as the improvements in the debugger made both, PyDev and PyCharm, better IDEs.

The Python-oriented IDEs likely have some other areas where they face common needs. What do you think are some top issues for Python IDEs in 2016 and beyond?

FZ: I agree that there are many common needs on IDEs — they do have the same target after all, although with wildly different implementations

Python code in particular is pretty hard to analyze in real-time — which contrasts with being simple and straightforward to read — and that’s something all “smart” Python IDEs have to deal with, so, there’s a fine balance on performance vs. features there, and that’s probably always going to be a top issue in any Python IDE.

Unfortunately, this is probably also a place where it’s pretty difficult to collaborate as the type inference engine is the heart of a Python IDE (and it’s also what makes it unique in a sense as each implementation ends up favoring one side or the other).

DT: The dynamic nature of Python was always the main challenge for IDEs to provide an assistance to developers. A huge step forward was done with Python 3.5, by adding a type hinting notation and typeshed repository from which we will all benefit a lot. But still this thing is in its early stage and we need to define and learn effective ways to adopt type hinting.

Python performance is also a challenge. In the Python world, when you care about performance, you switch from using pure Python to libraries written in C, like numpy. Or you try pypy. But in both cases performance and memory profiling becomes hard or even impossible with current standard tools and libraries. I think that tool developers can collaborate on that to provide better instruments for measuring and improving the performance of Python apps.

What’s in the future for pydevd, performance or otherwise?

FZ: I must say that performance wise, I think it has reached a nice balance on ease of development and speed, so, right now, the plan is not having any regression

Regarding new development, I don’t personally have any new features planned — the focus right now is on making it rock-solid!

DT: One of the additions to pydevd from the PyCharm side is the ability to capture the types of the function arguments in the running program. PyCharm tries to use this information for code completion, but this feature now is optional and off by default. With the new type hinting in Python 3.5 this idea gets a new spin and the types collected in run-time could be used to annotate functions with types or verify the existing annotations. We are currently experimenting only with types, but it could be taken further to analyse call hierarchy etc.

May 04, 2016 12:07 PM

Montreal Python User Group

Montréal-Python 58: Dramatics Chartreuse

We're close to a month before the next PyCon Conference in Portland, Oregon. We are organizing our 58th meetup at our lovely UQAM. Join us if you would like to feel what the Python community in Montreal is doing.

As usual we are receiving some guests in both languages and they will present you their projects and realizations.

Don't forget to join us after the meetup at the Benelux to celebrate spring in our lovely city.

Flash presentations

Kate Arthur: Kids CODE Jeunesse

Kids Code Jeunesse is dedicated to giving every Canadian child the chance to learn to code and to learn computational thinking. We introduce educators, parents and communities to intuitive teaching tools. We work in classrooms, community centres, host events and give workshops to supporting engaging educational experiences for everyone.

Christophe Reverd: Club Framboise (http://clubframboise.ca/)

Présentation du Club Framboise, la communauté des utilisateurs de Raspberry Pi à Montréal

Main presentations

Vadim Gubergrits: DIY Quantum Computer

An introduction to Quantum Computing with Python.

Pascal Priori: santropol-feast: Savoir faire Linux et des bénévoles accompagnent le Santropol Roulant (https://github.com/savoirfairelinux/santropol-feast)

Dans le cadre de la maison du logiciel libre, Savoir faire Linux et des bénévoles accompagnent le Santropol Roulant, un acteur du milieu communautaire montréalais dans la réalisation d'une plateforme de gestion de la base de donnée des clients en Django. En effet, au cœur des activités du Santropol Roulant, il y a le service de popote roulante qui cuisine, prépare et livre plus d’une centaine de repas chauds chaque jour à des personnes en perte d’autonomie. La base de données des clients joue un rôle clé dans la chaîne de services Réalisé en Django, le projet est à la recherche de bénévoles ayant envie de s'engager et contribuer au projet pour poursuivre le développement de la plateforme!

George Peristerakis: How CI is done in Openstack

In George's last talk, there was a lot of questions on the details of integrating code review and continuous integration in Openstack. This talk is a followup on the process and the technology behind implementing CI for Openstack.

Where

UQÀM, Pavillion PK

201, Président-Kennedy avenue

Room PK-1140

When

Monday, May 9th 2016

Schedule

6:00pm — Doors open
6:30pm — Presentations start
7:30pm — Break
7:45pm — Second round of presentations
9:00pm — End of the meeting, have a drink with us

We’d like to thank our sponsors for their continued support:

UQÀM
Bénélux
w.illi.am/
Outbox
Savoir-faire Linux
Caravan
iWeb

May 04, 2016 04:00 AM

Dataquest

How to get into the top 15 of a Kaggle competition using Python

Kaggle competitions are a fantastic way to learn data science and build your portfolio. I personally used Kaggle to learn many data science concepts. I started out with Kaggle a few months after learning programming, and later won several competitions.

May 04, 2016 01:00 AM

May 03, 2016

PyPy Development

PyPy 5.1.1 bugfix released

We have released a bugfix for PyPy 5.1, due to a regression in installing third-party packages depending on numpy (using our numpy fork available at https://bitbucket.org/pypy/numpy ).

Thanks to those who reported the issue. We also fixed a regression in translating PyPy which increased the memory required to translate. Improvement will be noticed by downstream packagers and those who translate rather than
download pre-built binaries.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

This release supports:

x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows 32, OpenBSD, FreeBSD),
newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,
big- and little-endian variants of PPC64 running Linux,
s390x running Linux

Please update, and continue to help us make PyPy better.

Cheers

The PyPy Team

May 03, 2016 06:18 PM

Weekly Python Chat

Debugging in Python

Having trouble figuring out what your code is doing? Debug-by-print not working well enough? Come to this chat and learn about the Python debugger!

George Khaburzaniya will show us how simple the Python debugger can be. We'll then do a Q&A about Python debugging, troubleshooting, and testing.

May 03, 2016 05:00 PM

Abu Ashraf Masnun

Python: Metaclass explained

One of the key features of the Python language is that everything is an object. These objects are instances of classes.

class MyClass:
    pass

a = int('5')
b = MyClass()

print(type(a))
print(type(b))

But hey, classes are objects too, no? Yes, they are.

class MyClass:
    pass

print(type(MyClass))

So the classes we define are of the type type. So meta! But how are the classes constructed from the type class? And also a moment ago, we saw that type is a function that returns the type of an object?

Yes, when we pass *just* an object to type, it returns us the type of that object. But if we pass it more details, it creates the class for us. Like this:

MyClass = type('MyClass', (), {})

instance = MyClass()
print(type(instance))

We can pass name, bases as tuple and the attributes as a dictionary to type and we get back a class. The class extends the provided bases and has the attributes we provided.

When we define a class like this:

class MyClass(int):
    name = "MyClass"

It’s internally equivalent of MyClass = type("MyClass", (int,), {"name": "MyClass"}). Here type is the metaclass of MyClass.

Objects are instances of Classes and the Classes are instances of Metaclasses.

So basically, that’s the basic – we create objects from classes and then we create classes out of metaclasses. In Python, type is the default metaclass for all the classes but this can be customized as we need.

Metaclass Hook

So what if we do not want to use type as the metaclass of our classes? We want to customize the way our classes are created and we don’t have any good way of modifying how the type metaclass works. So how do we roll our own metaclass and use them?

The pretty obvious way is to use the MyClass = MyMetaClass(name, bases, attrs) approach. But there’s another way to hook in a custom metaclass for a class. In Python 2, classes could define a __metaclass__ method which would be responsible for creating the class. In Python 3, we pass the metaclass callable as a keyword based argument in the base class list:

# Python 3
class MyClass(metaclass=MetaClass):
    pass

# Python 2
class MyClass():
    __metaclass__ = MetaClass

This metaclass argument has to be a callable which takes the name, bases and attributes as it’s arguments and returns a class object instance. Please note, the metaclass argument itself does not need to be a metaclass as long as it is a factory like callable that creates classes out of metaclasses.

def func_metaclass(name, bases, attrs):
    attrs['is_meta'] = True
    return type(name, bases, attrs)

class MyClass(metaclass=func_metaclass):
    pass

That is a very simple example of a function being used as a metaclass callable. Now let’s use classes.

class MetaClass(type):
    def __init__(cls, name, bases, attrs):
        super().__init__(name, bases, attrs)
        cls.is_meta = True

class MyClass(metaclass=MetaClass):
    pass


print(MyClass.is_meta)

Here, MetaClass is called with the arguments, which are in effect passed to it’s __new__ method and we get a class. We subclassed from type, so we didn’t need to provide our own implementation for the __new__ method. After __new__ is called, the __init__ method is called for initialization purposes. We added an extra attribute to the class in our overridden __init__ method.

In our function example, we directly used the type metaclass. So all the classes generated from that function would be of the type type. On the other hand, we extended type in our class based example. So the type of the generated classes would be our metaclass. So it’s beneficial to use the class based approach.

Use Cases

Let’s keep track of subclasses:

class TrackSubclasses(type):
    subclasses = {}

    def __init__(cls, name, bases, attrs):
        for base in bases:
            cls.subclasses[base] = cls.subclasses.get(base, 0) + 1

        super().__init__(name, bases, attrs)


class A(metaclass=TrackSubclasses):
    pass


class B(A):
    pass


class C(A):
    pass


class D(B):
    pass

print(TrackSubclasses.subclasses)

Or make a class final:

class Final(type):
    def __init__(cls, name, bases, attrs):
        super().__init__(name, bases, attrs)

        for klass in bases:
            if isinstance(klass, Final):
                raise TypeError("{} is final".format(klass.__name__))

class FinalClass(metaclass=Final):
    pass

class ChildClass(FinalClass):
    pass

May 03, 2016 03:57 PM

PyCon

Introducing our 2016 Keystone Sponsor: Heroku!

We organizers of PyCon 2016 are grateful that, amidst a roiling stock market and uncertain economy, so many sponsors have stepped forwardto assert that their relationship with the Python community is worth investing in. And we are particularly happy to announce that our highest level of sponsorship has been filled.

That’s right — a Keystone sponsor has stepped forward: Heroku is our Keystone sponsor for PyCon 2016!

If you have attended a recent PyCon, you might remember visiting Heroku’s elegant booth in the Expo Hall. And many more of you in the community have used Heroku before to deploy web projects large and small — in their own words:

“Heroku is a cloud platform that lets you build, deploy, manage and scale apps. We’re the fastest way from git push to a live app, because we let you bypass infrastructure and deployment headaches. You just focus on your code, and we make the rest easy.”

Speaking from personal experience, when I helped build a Django app for a non-profit: it is dismaying to explain to a small organization how much work is traditionally involved in self-deploying a new app. An organization would rent or purchase a server, monitor its logs, keep it patched and updated, install the app and Django and the other Python dependencies, install PostgreSQL, give the app access to the database, and establish backups that they then have to monitor and archive.

All of that disappeared when I pointed the organization at Heroku. Their app now serves users every day, without their staff having had to spend even a moment worrying whether their PostgreSQL write-ahead log is working properly, whether a critical operating system patch is overdue, or whether the database is being backed up.

I asked the folks at Heroku why PyCon is on their list of conferences each year:

“We know that building the best platform for Python developers is easier when we can talk to them and find out what’s happening on the ground. So, we’re thrilled to be participating again — so thrilled that we’re the Keystone sponsor of PyCon 2016. We can’t wait to explore Portland with you all, and build some really wonderful apps and memories along the way.”

And how did Python itself get on their radar in the first place? Has the language been a successful choice of target for their platform?

“Python is simple and elegant — which is exactly what your deploys on Heroku feel like. We’ve been seeing amazing growth in Python on Heroku, and that’s why we have folks on our team like Kenneth Reitz, who can advocate for the needs and interests of the community. He makes sure that Pythonistas are happy with the Heroku experience.”

The Kenneth Reitz they mention is, as you probably know, the famous author of the Requests library. When not working on his open source projects, he has spent the last several years crafting Heroku’s support for hosting Python-language applications.

We look forward to seeing Heroku in the Expo Hall at PyCon 2016, and are excited that they have stepped forward this year to take on the responsibility of the Keystone sponsorship. Thank you, Heroku!

May 03, 2016 01:25 PM

Peter Bengtsson

How to track Google Analytics pageviews on non-web requests (with Python)

tl;dr; Use raven's `ThreadedRequestsHTTPTransport` transport class to send Google Analytics pageview trackings asynchronously to Google Analytics to collect pageviews that aren't actually browser pages.

We have an API on our Django site that was not designed from the ground up. We had a bunch of internal endpoints that were used by the website. So we simply exposed those as API endpoints that anybody can query. All we did was wrap certain parts carefully as to not expose private stuff and we wrote a simple web page where you can see a list of all the endpoints and what parameters are needed. Later we added auth-by-token.

Now the problem we have is that we don't know which endpoints people use and, as equally important, which ones people don't use. If we had more stats we'd be able to confidently deprecate some (for easier maintanenace) and optimize some (to avoid resource overuse).

Our first attempt was to use statsd to collect metrics and display those with graphite. But it just didn't work out. There are just too many different "keys". Basically, each endpoint (aka URL, aka URI) is a key. And if you include the query string parameters, the number of keys just gets nuts. Statsd and graphite is better when you have about as many keys as you have fingers on one hand. For example, HTTP error codes, 200, 302, 400, 404 and 500.

Also, we already use Google Analytics to track pageviews on our website, which is basically a measure of how many people render web pages that have HTML and JavaScript. Google Analytic's UI is great and powerful. I'm sure other competing tools like Mixpanel, Piwik, Gauges, etc are great too, but Google Analytics is reliable, likely to stick around and something many people are familiar with.

So how do you simulate pageviews when you don't have JavaScript rendering? The answer; using plain HTTP POST. (HTTPS of course). And how do you prevent blocking on sending analytics without making your users have to wait? By doing it asynchronously. Either by threading or a background working message queue.

Threading or a message queue

If you have a message queue configured and confident in its running, you should probably use that. But it adds a certain element of complexity. It makes your stack more complex because now you need to maintain a consumer(s) and the central message queue thing itself. What if you don't have a message queue all set up? Use Python threading.

To do the threading, which is hard, it's always a good idea to try to stand on the shoulder of giants. Or, if you can't find a giant, find something that is mature and proven to work well over time. We found that in Raven.

Raven is the Python library, or "agent", used for Sentry, the open source error tracking software. As you can tell by the name, Raven tries to be quite agnostic of Sentry the server component. Inside it, it has a couple of good libraries for making threaded jobs whose task is to make web requests. In particuarly, the awesome ThreadedRequestsHTTPTransport. Using it basically looks like this:

import urlparse
from raven.transport.threaded_requests import ThreadedRequestsHTTPTransport

transporter = ThreadedRequestsHTTPTransport(
    urlparse.urlparse('https://ssl.google-analytics.com/collect'),
    timeout=5
)

params = {
    ...more about this later...
}

def success_cb():
    print "Yay!"

def failure_cb(exception):
    print "Boo :("

transporter.async_send(
    params,
    headers,
    success_cb,
    failure_cb
)

The call isn't very different from regular plain old requests.post.

About the parameters

This is probably the most exciting part and the place where you need some thought. It's non-trivial because you might need to put some careful thought into what you want to track.

Your friends is: This documentation page

There's also the Hit Builder tool where you can check that the values you are going to send make sense.

Some of the basic ones are easy:

"Protocol Version"

Just set to v=1

"Tracking ID"

That code thing you see in the regular chunk of JavaScript you put in the head, e.g tid=UA-1234-Z

"Data Source"

Optional word you call this type of traffic. We went with ds=api because we use it to measure the web API.

The user ones are a bit more tricky. Basically because you don't want to accidentally leak potentially sensitive information. We decided to keep this highly anonymized.

"Client ID"

A random UUID (version 4) number that identifies the user or the app. Not to be confused with "User ID" which is basically a string that identifies the user's session storage ID or something. Since in our case we don't have a user (unless they use an API token) we leave this to a new random UUID each time. E.g. cid=uuid.uuid4().hex This field is not optional.

"User ID"

Some string that identifies the user but doesn't reveal anything about the user. For example, we use the PostgreSQL primary key ID of the user as a string. It just means we can know if the same user make several API requests but we can never know who that user is. Google Analytics uses it to "lump" requests together. This field is optional.

Next we need to pass information about the hit and the "content". This is important. Especially the "Hit type" because this is where you make your manually server-side tracking act as if the user had clicked around on the website with a browser.

"Hit type"

Set this to t=pageview and it'll show up Google Analytics as if the user had just navigated to the URL in her browser. It's kinda weird to do this because clearly the user hasn't. Most likely she's used curl or something from the command line. So it's not really a pageview but, on our end, we have "views" in the webserver that produce information to the user. Some of it is HTML and some of it is JSON, in terms of output format, but either way they're sending us a URL and we respond with data.

"Document location URL"

The full absolute URL of that was used. E.g. https://www.example.com/page?foo=bar. So in our Django app we set this to dl=request.build_absolute_uri(). If you have a site where you might have multiple domains in use but want to collect them all under just 1 specific domain you need to set dh=example.com.

"Document Host Name" and "Document Path"

I actually don't know what the point of this is if you've already set the "Document location URL".

"Document Title"

In Google Analytics you can view your Content Drilldown by title instead of by URL path. In our case we set this to a string we know from the internal Python class that is used to make the API endpoint. dt='API (%s)'%api_model.__class__.__name__.

There are many more things you can set, such as the clients IP, the user agent, timings, exceptions. We chose to NOT include the user's IP. If people using the JavaScript version of Google Analytics can set their browser to NOT include the IP, we should respect that. Also, it's rarely interesting to see where the requests for a web API because it's often servers' curl or requests that makes the query, not the human.

Sample implementation

Going back to the code example mentioned above, let's demonstrate a fuller example:

import urlparse
from raven.transport.threaded_requests import ThreadedRequestsHTTPTransport

transporter = ThreadedRequestsHTTPTransport(
    urlparse.urlparse('https://ssl.google-analytics.com/collect'),
    timeout=5
)

# Remember, this is a Django, but you get the idea

domain = settings.GOOGLE_ANALYTICS_DOMAIN
if not domain or domain == 'auto':
    domain = RequestSite(request).domain

params = {
    'v': 1,
    'tid': settings.GOOGLE_ANALYTICS_ID,
    'dh': domain,
    't': 'pageview,
    'ds': 'api',
    'cid': uuid.uuid4().hext,
    'dp': request.path,
    'dl': request.build_request_uri(),
    'dt': 'API ({})'.format(model_class.__class__.__name__),
    'ua': request.META.get('HTTP_USER_AGENT'),
}

def success_cb():
    logger.info('Successfully informed Google Analytics (%s)', params)

def failure_cb(exception):
    logger.exception(exception)

transporter.async_send(
    params,
    headers,
    success_cb,
    failure_cb
)

How to unit test this

The class we're using, ThreadedRequestsHTTPTransport has, as you might have seen, a method called async_send. There's also one, with the exact same signature, called sync_send which does the same thing but in a blocking fashion. So you could make your code look someting silly like this:

def send_tracking(page_title, request, async=True):
    # ...same as example above but wrapped in a function...
    function = async and transporter.async_send or transporter.sync_send
    function(
        params,
        headers,
        success_cb,
        failure_cb
    )

And then in your tests you pass in async=False instead.
But don't do that. The code shouldn't be sub-serviant to the tests (unless it's for the sake of splitting up monster-long functions).
Instead, I recommend you mock the inner workings of that ThreadedRequestsHTTPTransport class so you can make the whole operation synchronous. For example...

import mock
from django.test import TestCase
from django.test.client import RequestFactory

from where.you.have import pageview_tracking


class TestTracking(TestCase):

    @mock.patch('raven.transport.threaded_requests.AsyncWorker')
    @mock.patch('requests.post')
    def test_pageview_tracking(self, rpost, aw):

        def mocked_queue(function, data, headers, success_cb, failure_cb):
            function(data, headers, success_cb, failure_cb)

        aw().queue.side_effect = mocked_queue

        request = RequestFactory().get('/some/page')
        with self.settings(GOOGLE_ANALYTICS_ID='XYZ-123'):
            pageview_tracking('Test page', request)

            # Now we can assert that 'requests.post' was called.
            # Left as an exercise to the reader :)
            print rpost.mock_calls

This is synchronous now and works great. It's not finished. You might want to write a side effect for the requests.post so you can have better control of that post. That'll also give you a chance to potentially NOT return a 200 OK and make sure that your failure_cb callback function gets called.

How to manually test this

One thing I was very curious about when I started was to see how it worked if you really ran this for reals but without polluting your real Google Analytics account. For that I built a second little web server on the side, whose address I used instead of https://ssl.google-analytics.com/collect. So, change your code so that https://ssl.google-analytics.com/collect is not hardcoded but a variable you can change locally. Change it to http://localhost:5000/ and start this little Flask server:

import time
import random
from flask import Flask, abort, request

app = Flask(__name__)
app.debug = True

@app.route("/", methods=['GET', 'POST'])
def hello():
    print "- " * 40
    print request.method, request.path
    print "ARGS:", request.args
    print "FORM:", request.form
    print "DATA:", repr(request.data)
    if request.args.get('sleep'):
        sec = int(request.args['sleep'])
        print "** Sleeping for", sec, "seconds"
        time.sleep(sec)
        print "** Done sleeping."
    if random.randint(1, 5) == 1:
        abort(500)
    elif random.randint(1, 5) == 1:
        # really get it stuck now
        time.sleep(20)
    return "OK"

if __name__ == "__main__":
    app.run()

Now you get an insight into what gets posted and you can pretend that it's slow to respond. Also, you can get an insight into how your app behaves when this collection destination throws a 5xx error.

How to really test it

Google Analytics is tricky to test in that they collect all the stuff they collect then they take their time to process it and it then shows up the next day as stats. But, there's a hack! You can go into your Google Analytics account and click "Real-Time" -> "Overview" and you should see hits coming in as you're testing this. Obviously you don't want to do this on your real production account, but perhaps you have a stage/dev instance you can use. Or, just be patient :)

May 03, 2016 12:45 PM

Mike Driscoll

Python 201: An Intro to Iterators and Generators

You have probably been using iterators and generators since you started programming in Python but you may not have realized it. In this article, we will learn what an iterator and a generator are. We will also be learning how they are created so we can create our own should we need to.

Iterators

An iterator is an object that will allow you to iterate over a container. The iterator in Python is implemented via two distinct methods: __iter__ and __next__. The __iter__ method is required for your container to provide iteration support. It will return the iterator object itself. But if you want to create an iterator object, then you will need to define __next__ as well, which will return the next item in the container.

Note: In Python 2, the naming convention was slightly different. You still needed __iter__, but __next__ was called next.

To make things extra clear, let’s go over a couple of definitions:

iterable – an object that has the __iter__ method defined
iterator – an object that has both __iter__ and __next__ defined where __iter__ will return the iterator object and __next__ will return the next element in the iteration.

As with most magic methods (the methods with double-underscores), you should not call __iter__ or __next__ directly. Instead you can use a for loop or list comprehension and Python will call the methods for you automatically. There are cases when you may need to call them, but you can do so with Python’s built-ins: iter and next.

Before we move on, I want to mention Sequences. Python 3 has several sequence types such as list, tuple and range. The list is an iterable, but not an iterator because it does not implement __next__. This can be easily seen in the following example:

>>> my_list = [1, 2, 3]
>>> next(my_list)
Traceback (most recent call last):
  Python Shell, prompt 2, line 1
builtins.TypeError: 'list' object is not an iterator

When we tried to call the list’s next method in the example above, we received a TypeError and were informed that the list object is not an iterator.

>>> iter(my_list)
<list_iterator object at 0x7faaaa477a58>
>>> list_iterator = iter(my_list)
>>> next(list_iterator)
1
>>> next(list_iterator)
2
>>> next(list_iterator)
3
>>> next(list_iterator)
Traceback (most recent call last):
  Python Shell, prompt 8, line 1
builtins.StopIteration:

To turn the list into an iterator, just wrap it in a call to Python’s iter method. Then you can call next on it until the iterator runs out of items and StopIteration gets raised. Let’s try turning the list into an iterator and iterating over it with a loop:

>>> for item in iter(my_list):
...     print(item)
... 
1
2
3

When you use a loop to iterate over the iterator, you don’t need to call next and you also don’t have to worry about the StopIteration exception being raised.

Creating your own iterators

Occasionally you will want to create your own custom iterators. Python makes this very easy to do. As mentioned in the previous section, all you need to do is implement the __iter__ and __next__ methods in your class. Let’s create an iterator that can iterate over a string of letters:

class MyIterator:
 
    def __init__(self, letters):
        """
        Constructor
        """
        self.letters = letters
        self.position = 0
 
    def __iter__(self):
        """
        Returns itself as an iterator
        """
        return self
 
    def __next__(self):
        """
        Returns the next letter in the sequence or 
        raises StopIteration
        """
        if self.position >= len(self.letters):
            raise StopIteration
        letter = self.letters[self.position]
        self.position += 1
        return letter
 
if __name__ == '__main__':
    i = MyIterator('abcd')
    for item in i:
        print(item)

For this example, we only needed three methods in our class. In our initialization, we pass in the string of letters and create a class variable to refer to them. We also initialize a position variable so we always know where we’re at in the string. The __iter__ method just returns itself, which is all it really needs to do. The __next__ method is the meatiest part of this class. Here we check the position against the length of the string and raise StopIteration if we try to go past its length. Otherwise we extract the letter we’re on, increment the position and return the letter.

Let’s take a moment to create an infinite iterator. An infinite iterator is one that can iterate forever. You will need to be careful when calling these as they will cause an infinite loop if you don’t make sure to put a bound on them.

class Doubler:
    """
    An infinite iterator
    """
 
    def __init__(self):
        """
        Constructor
        """
        self.number = 0
 
    def __iter__(self):
        """
        Returns itself as an iterator
        """
        return self
 
    def __next__(self):
        """
        Doubles the number each time next is called
        and returns it. 
        """
        self.number += 1
        return self.number * self.number
 
if __name__ == '__main__':
    doubler = Doubler()
    count = 0
 
    for number in doubler:
        print(number)
        if count > 5:
            break
        count += 1

In this piece of code, we don’t pass anything to our iterator. We just instantiate it. Then to make sure we don’t end up in an infinite loop, we add a counter before we start iterating over our custom iterator. Finally we start iterating and break out when the counter goes above 5.

Generators

A normal Python function will always return one value, whether it be a list, an integer or some other object. But what if you wanted to be able to call a function and have it yield a series of values? That is where generators come in. A generator works by “saving” where it last left off (or yielding) and giving the calling function a value. So instead of returning the execution to the caller, it just gives temporary control back. To do this magic, a generator function requires Python’s **yield** statement.

Side-note: In other languages, a generator might be called a coroutine.

Let’s take a moment and create a simple generator!

>>> def doubler_generator():
...     number = 2
...     while True:
...         yield number
...         number *= number
>>> doubler = doubler_generator()
>>> next(doubler)
2
>>> next(doubler)
4
>>> next(doubler)
16
>>> type(doubler)
<class 'generator'>

This particular generator will basically create an infinite sequence. You can call next on it all day long and it will never run out of values to yield. Because you can iterate over a generator, a generator is considered to be a type of iterator, but no one really refers to them as such. But underneath the covers, the generator is also defining the __next__ method that we looked at in our previous section, which is why the next keyword we just used worked.

Let’s look at another example that only yields 3 items instead of an infinite sequence!

>>> def silly_generator():
...     yield "Python"
...     yield "Rocks"
...     yield "So do you!"
>>> gen = silly_generator()
>>> next(gen)
'Python'
>>> next(gen)
'Rocks'
>>> next(gen)
'So do you!'
>>> next(gen)
Traceback (most recent call last):
  Python Shell, prompt 21, line 1
builtins.StopIteration:

Here we have a generator that uses the yield statement 3 times. In each instance, it yields a different string. You can think of yield as the return statement for a generator. Whenever you call yield, the function stops and saves its state. Then it yields the value out, which is why you see something getting printed out to the terminal in the example above. If we’d had variables in our function, those variables would be saved too.

When you see StopIteration, you know that you have exhausted the iterator. This means that it ran out of items. This is normal behavior in all iterators as you saw the same thing happen in the iterators section.

Anyway when we call next again, the generator begins where it left off and yields whatever the next value is or we finish the function and the generator stops. On the other hand, if you never call next again, then the state will eventually go away.

Let’s reinstantiate the generator and try looping over it!

>>> gen = silly_generator()
>>> for item in gen:
...     print(item)
... 
Python
Rocks
So do you!

The reason we create a new instance of the generator is that if we tried looping over it, nothing would be yielded. This is because we already ran through all the values in that particular instance of the generator. So in this example, we create the new instance, loop over it and print out the values that are yielded. The for loop once again handles the StopIteration exception for us and just breaks out of the loop when the generator is exhausted.

One of the biggest benefits to a generator is that it can iterate over large data sets and return them one piece at a time. This is what happens when we open a file and return it line-by-line:

with open('/path/to/file.txt') as fobj:
    for line in fobj:
        # process the line

Python basically turns the file object into a generator when we iterate over it in this manner. This allows us to process files that are too large to load into memory. You will find generators useful for any large data set that you need to work with in chunks or when you need to generate a large data set that would otherwise fill up your all your computer’s memory.

Wrapping Up

At this point you should now understand what an iterator is and how to use one. You should also know the difference between an iterable and an iterator. Finally, we learned what a generator is and why you might want to use one. For example, a generator is great for memory efficient data processing. Happy coding!

May 03, 2016 12:30 PM

Caktus Consulting Group

Florida Open Debate Platform Receives National Attention (The Atlantic, USA Today, Engadget)

Several national publications have featured the Florida Open Debate platform, including USA Today, Engadget, and The Atlantic. Caktus helped develop the Django-based platform on behalf of the Open Debate Coalition (ODC) in advance of the nation’s first-ever open Senate debate held in Florida on April 25th. The site enabled citizens to submit debate questions as well as vote on which questions mattered most to them. Moderators then used the thirty most popular questions from the site to structure the debate between Florida Senate candidates David Jolly (R) and Alan Grayson (D). According to The Atlantic,more than 400,000 votes were submitted by users on the site, including more than 84,000 from Florida voters.

Florida Open Debate user-submitted questions

“Normally, the press frames important US election debates by choosing the questions and controlling the video broadcast,” wrote Steve Dent. “For the first time, however, the public... decide[d] the agenda.”

In his article for The Atlantic, Russell Berman also applauded the site’s effort “to make bottom-up, user-generated questions the centerpiece of a debate.” But possibly more significant were the results of this crowd-sourced content. “What transpired was, by all accounts, a decent debate,” Berman writes. “For 75 minutes, Grayson and Jolly addressed several weighty policy disputes—money in politics, Wall Street reform, the minimum wage, climate change, the solvency of Social Security—and often in detail.”

The Florida debate was streamed live on Monday to more than 80,000 viewers. The Open Debate platform is receiving attention and interest from various potential debate sponsors as well as the Commission on Presidential Debates for possible use in the in this fall’s presidential elections.

May 03, 2016 12:00 PM

Kushal Das

dgplug.org is now using Lektor

Couple of years back we moved dgplug into a static website. But still there are requirements when we do want to update parts of the site in a timely manner. We also wanted to track the changes. We tried to maintain a sphinx based docs, but somehow we never managed to do well in that case.

This Sunday Suraj, Chandan, and /me were having a casual discussion, and the site management came up. I knew Armin wrote something new in the static sites side, but never managed to look into it. From the website of Lektor:

A flexible and powerful static content management system for building
complex and beautiful websites out of flat files — for people who do not
want to make a compromise between a CMS and a static blog engine.

So Lektor is for static cms, and that is what I was looking for. The dependency chain also looked reasonable. As usual documentation is in great shape. Yesterday I spent some time to setup a Fedora container which has Lektor inside along with a small flask based webapp. Sayan helped to fix the template/CSS issues.The web application listens for events from github webhoooks, and rebuilds the site. You can find the source for our site here, and the container Dockerfile and other details are here. Note: The web application is actually very crude in nature :)

May 03, 2016 06:44 AM

Glyph Lefkowitz

Letters To The Editor: Re: Email

Since I removed comments from this blog, I’ve been asking y’all to email me when you have feedback, with the promise that I’d publish the good bits. Today I’m making good on that for the first time, with this lovely missive from Adam Doherty:

I just wanted to say thank you. As someone who is never able to say no, your article on email struck a chord with me. I have had Gmail since the beginning, since the days of hoping for an invitation. And the day I received my invitation was the the last day my inbox was ever empty.

Prior to reading your article I had over 40,000 unread messages. It used to be a sort of running joke; I never delete anything. Realistically though was I ever going to do anything with them?

With 40,000 unread messages in your inbox, you start to miss messages that are actually important. Messages that must become tasks, tasks that must be completed.

Last night I took your advice; and that is saying something - most of the things I read via HN are just noise. This however spoke to me directly.

I archived everything older than two weeks, was down to 477 messages and kept pruning. So much of the email we get on a daily basis is also noise. Those messages took me half a second to hit archive and move on.

I went to bed with zero messages in my inbox, woke up with 21, archived 19, actioned 2 and then archived those.

Seriously, thank you so very much. I am unburdened.

First, I’d like to thank Adam for writing in. I really do appreciate the feedback.

Second, I wanted to post this here not in service of showcasing my awesomeness¹, but rather to demonstrate that getting to the bottom of your email can have a profound effect on your state of mind. Even if it’s a running joke, even if you don’t think it’s stressing you out, there’s a good chance that, somewhere in the back of your mind, it is. After all, if you really don’t care, what’s stopping you from hitting select all / archive right now?

At the very least, if you did that, your mail app would load faster.

although, let there be no doubt, I am awesome ↩

May 03, 2016 06:06 AM

Graeme Cross

PyCon AU’s Call for Proposals: one week to go!

The call for papers for PyCon Australia, including for the Science and Data miniconference, is down to its last week!

Make sure you register and get your talk/tutorial ideas down as soon as possible. Submissions can be edited up until the deadline.

Applications for financial assistance are also open, and we strongly encourage potential speakers to apply if needed. We can cover not only conference registration and travel/accommodation costs, but also substitute teacher costs, childcare, carer costs and other costs related to equity of access.

May 03, 2016 01:28 AM

May 02, 2016

Django Weblog

Django bugfix releases issued: 1.9.6 and 1.8.13

Today we've issued bugfix releases for the 1.9 and 1.8 release series. Details can be found in the release notes for 1.9.6 and 1.8.13.

The release package and checksums are available from our downloads page, as well as from the Python Package Index. The PGP key ID used for this release is Tim Graham: 1E8ABDC773EDE252.

May 02, 2016 10:54 PM

Tryton News

New Tryton release 4.0

We are proud to announce the 4.0 release of Tryton.

This is the first release of Tryton that adds support for Python 3. The server and most of the modules support it. The missing modules are mainly the ones using the WebDAV and LDAP. The client will be ported once GTK-3 support is added. The release sees also a large refactoring of the protocol stack which was previously based on the SimpleHTTPServer of Python. Now it uses a WSGI application running on the Werkzeug server by default. Any WSGI server can be used to run Tryton, this removes the design constraint of single threaded process and opens the way for using workers. All modules have been reviewed to enforce the naming convention about the document identification. The name "code" is used for all referential documents like parties, product. The name "number" is used for the internal identification of all operational documents like sales, purchases, invoices etc. And finally, the name "reference" is used for identifications from external systems like the sale order number of the supplier of your purchase. Two new languages are now available on default installation the Lao and the Simplified Chinese.

As Richard Stallman reminded to us, the migration from previous series is fully supported.

Major changes for the user

The new note functionality handles the management system for general textual notes on any arbitrary model in Tryton. On click it opens a notes dialog, where the user can maintain notes. The read state of every note is managed per user. Like the attachments, the icon in the tool-bar visualizes when there are notes on a model.
The CSV import and export has been highly reworked for a better experience. The import dialog now supports drag and drop to order the selected columns like the CSV export wizard. Both wizards are able to use any of the encodings available in Python. It is now possible to configure the CSV parameters for the export result.
The charts provided by the graph view have been improved. Now they use softer colors, thinner lines and smaller arcs. On the background, dash style is used instead normal line for axis representation. A smart transparency value is applied for filling lines to always show through it.
A new button in the scheduler allows running a job once, useful to run jobs on demand or to test new settings.

Accounting

The report design of the General Ledger, Trial Balance and Aged Balance has been re-worked. They are now based on the new dynamic views. This provides a performance burst and allows to filter the records more precisely. In addition to the previous printable output, they also gain the CSV export which is useful to perform specific operations in a spreadsheet application.
A date field is added to the Aged Balance, to modify the date on which the calculations are based. With this feature, it is possible to generate reports based on a past date as if it would have been generated by ignoring reconciliation that happened after that date.
The functionality of Third Party Balance is merged into the Aged Balance. We found that the Third Party Balance computed the same data as the Aged Balance with the type Customers and Suppliers.

Party

The Name field on party is no longer required for input. This solves a long standing request to be able to maintain parties where the name is not known on creation.

Product

A configuration form is added to the product module with these options:
- The default value for Use Category fields.
- The default value for the Cost Price Method.
It was not always easy to explain the design of products with the templates, especially when it was not really relevant for the current business. So we have redesigned both views to be very similar and indeed they use the exact same design. The fields, that do not exist on the product, are automatically replaced by the value of the template.
The field Category is replaced by a Categories field, to support the ability of adding many categories to one product. This is very useful for example to create multi-axis categories for web shop.

Classification

This new module defines the reference basis to create different kinds of classifications for products. It adds a generic field Classification on the product form.

Classification Taxonomic

This new module introduces the taxonomic classification of products as an example using the new Classification module. It includes classifications by Taxon and by Cultivar.

Purchase

The field Delivery Time on product supplier is replaced by Lead Time which increases the precision from days to microseconds.
For each warehouse, it is now possible to define the location where the goods are picked in case of supplier return. If this location is not defined, the default storage will be used.

Request

The purchase request functionalities have been decoupled from stock_supply and sale_supply modules into a new separate module named purchase_request. This is to prepare future work that will use only purchase requests without the need of other stock_supply features.

A new state Exception is added to the purchase request. This is useful to manage cancelled purchase when linked to drop shipments.

Sale

The field Delivery Date on model Sale Line is renamed into Shipping Date to avoid any confusion.
The field Delivery Time on product form is replaced by Lead Time which increases the precision from days to microseconds.
The custom history management on Sale Opportunity is replaced by the general revision functionality of the client. This increases its precision and works automatically for any new fields.

Stock

The address of the destination warehouse of the Internal Shipment is now displayed on the report.
Now it is possible to manually do a move with the new Do button. This is useful for example to get correct accounting when you have long living productions.
Supplier return shipments now have a supplier and a delivery address fields. Those fields will be automatically populated for shipments created from purchase.

Production

Routing

This new module defines the routings, steps and operations for productions. A routing is a list of ordered steps and each step is defined by a generic operation.

Work

This new module completes the routing module by creating the Works of a production based on its routing. A Work is linked to a Work Center which defines the cost using one of these two methods: Per Cycle or Per Hour. The cost of a work is computed using the Cycles created on it and later added to the global cost of the production.

Major changes for the developer

The domains are now accepting a new parent_of operator which recursively returns all the records that are parents of the searched records. This is the opposite of the existing child_of operator.
It is now possible to inherit from a view that already inherits another view from a different model.
The new where domain operator is useful when you need to search on a xxx2Many with a full sub-domain instead of separated clauses. It has the advantage to avoid to fetch an intermediary result by using a sub-query.
The Transaction design has been reworked to be closer to the design defined by the PEP-0249. This new design allows to support nested transactions. It also supports multiple cursors for the same transaction, reducing the memory consumption when iterating over large result sets.
A new context model is introduced, to save the trouble of writing simple wizards for configuring reports by setting some values in the context. With this new design, the developer can define a model for which each field will define the values of the context. The form of this model will be displayed on top of the view and the view will be automatically reloaded when the context is changed.
Now its possible to have reports in plain, XML, HTML and XHTML reports. With this change the report infrastructure can be reused for example to design email templates.
This release adds support for the Two-Phase commit protocol which allows to coordinate distributed transactions. By default, Tryton uses a single transaction from the database back-end. But when Tryton has to communicate with other systems, it is good to use TPC to keep data integrity. The implementation follows the API of the Zope Data Manager. The data managers of the Zope community can be used within Tryton.
Thanks to the two-phase commit protocol now mails can be sent when the transaction is committed, so if something goes wrong and the transaction is rolled back no mails are sent.

Accounting

The reconciliation process now stores the date of the reconciliation. By default, it is the highest date of the reconciled lines. This allows to filter reconciled lines based on this date, for example to generate a report with the unreconciled lines before a specific date.
The Credit Notes have been merged into the Invoices. They are now standard invoices with negative quantities. This allows to easily group both types into a single document. The numbering can still be differentiated depending on the sign of the lines.

Note: with the merge of Invoice and Credit Note, the signs of the taxes for Credit Note must be inverted manually.

Product

Uom.round is now an instance method which makes more sense according to its signature.

Purchase

The Purchase has received a done transition like the Sale to allow extensions to perform some action when this transition is performed.
It is now possible to search Purchase Requests using the Purchase field.

WebDAV

WebDAV has been decoupled from trytond into a separate module which improves the modularity of the system. Indeed many setups do not use the WebDAV so it was a little bit bloated to have it into the base. Also dependencies of this module prevented to add the support of Python 3 to trytond. So, for now, the WebDAV protocol is managed by a separated process but it will probably return in the future into the main process.

May 02, 2016 06:00 PM

Will McGugan

Posting Screenshots with Inthing

Inthing can take a screenshot of your desktop then post it online.

Here's a quick example:

from inthing import Stream
my_stream = Stream.new()
my_stream.screenshot(title="My Desktop", delay=5).browse()

Alternatively you can do the same thing from the command line with the inthing app. The following is equivalent to the Python code (assuming you have set up the stream and password environment variables):

$ inthing screenshot --title "Screenshot from the command line" --delay 5 --browse

See the docs for more details.

Screenshot capabilities courtesy of the excellent pyscreenshot library.

Screenshot taken with Stream.screenshot

May 02, 2016 01:30 PM

Doug Hellmann

compileall — Byte-compile Source Files — PyMOTW 3

The compileall module finds Python source files and compiles them to the byte-code representation, saving the results in .pyc . Read more… This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for more articles from the series.

May 02, 2016 01:00 PM

Mike Driscoll

PyDev of the Week: Mark Lutz

This week we have the honor of welcoming Mark Lutz as our PyDev of the Week. Mark is the author of the first Python book ever, Programming Python. He has also authored Learning Python and the Python Pocket Reference, all three of which were with O’Reilly publishing. Rather than rehash more of his background, let’s just jump into the interview so we can learn more about Mark Lutz!

Can you tell us a little about yourself (hobbies, education, etc):

I’m a software engineer, a Python book author and trainer, and one of the people who helped Python rise to the prominence it enjoys today.

I’ve been working in the software field for 3 decades, and earned BS and MS degrees in computer science from the University of Wisconsin. Back when I was being paid to do development, I worked on compilers for supercomputers and CAD systems, and large-scale applications at assorted start-ups.

Since quitting my “day job” two decades ago, I’ve been teaching Python classes in North America and Europe, and writing Python books that have sold over half a million units and span 11k published pages. To put that in perspective, back-of-the-envelope calculations suggest that the paper units of my books sold comprise about 650 tons (and counting).

If you’re interested in more details, check out my formal bio page at http://learning-python.com/books/formalbio.html.

Why did you start using Python?

I’ve been using Python since 1992, when it was at version 0.X, and the concept of a web site was still largely academic recreation. At the time, I was hired away from a compilers job to spend 6 months exploring available scripting languages as candidates for a portable GUI toolkit’s generated code. I and a cohort wrote benchmark programs in Python, Tcl, and their contemporaries, and Python came out the clear winner in terms of readability, functionality, and robustness.

As it happened, I was also growing increasingly frustrated over the artificial complexities in the C++ language, and Python seemed to promise a better path for developers. For people who already knew what they were doing, Python even then was a slam-dunk in terms of productivity gains. Such began a fork in my career path which I’m still following today.

More recently, the original message of Python as a better tool for developers morphed into a promise of accessibility for all. It’s great to see so many people getting into programming these days, but I think it’s crucial that we also set expectations realistically. Languages aside, software has always been both fun and hard stuff, and successful practitioners still need to understand the full “stack”, from scripting through chips. Python’s just one part of that stack.

What other programming languages do you know and which is your favorite?

I’ve used too many to list here (including COBOL and FORTRAN, in prior lives). I was heavily into Prolog in college, and did a thesis project which involved an optimized implementation of that language in C and 68K assembler. Prolog’s novel approach to computing was exciting, but it was also too arcane for many programmers, and lacked a practical applications focus. I also did a bit of AI work in Lisp. And there’s this language called Python which I hear good things about…

Really, to people with a decent background in CS, most languages are just variations on a theme; it’s how they put their ideas together that matters. I prefer languages that do so more coherently than chaotically.

What projects are you working on now?

I’ve been working on a few Python project lately, mostly for fun and personal use, but also as teaching examples for learners of the language. Among these are frigcal, a tkinter-based calendar GUI; mergeall, a folder sync tool for mirroring archives across devices; and the latest version of pymailgui, a POP/SMTP email client.

These programs are also in part intended as answers to some of the personal privacy dangers of today’s web. Uploading personal data to clouds, and having my email or calendar scanned are just not options for me. You can find these and other Python program examples at http://learning-python.com/books/recenthighlights.html#programs

Which Python libraries are your favorite (core or 3rd party)?

There’s too much out there to name a favorite per se, but I still find [tT]kinter great for quick GUI work, and leveraged the icalendar third-party library for interfacing with iCalendar files in the frigcal program. Batteries-included works great, as long as those batteries work as advertised, and aren’t replaced while you depend on them.

Where do you see Python going as a programming language?

I suspect Python will continue to grow its user base in years ahead. This is especially so, given the push to add CS as a core topic in school curriculums around the world. By offering a rare combination of functionality and usability, Python should continue to prove a valuable asset in educational contexts.

As for the language itself, I have concerns about the ongoing and sometimes reckless feature bloat that it shares with other open source projects, but I’ll pass on that subject here for space. For more details, check out my recent Python changes page at http://learning-python.com/books/python-changes-2014-plus.html

Why did you decide to write books about Python?

As already mentioned, I was there for the early days of Python, when there were no books, but the docs came with a plea for someone to write one. At the time, I was also considering forming a start-up to build Python development tools, but the book idea won out in the end. It seemed a great opportunity to publicize and support something which I believed could make a difference in the field.

Early adopters like myself learned Python from its then-brief tutorial and source code, but that wasn’t going to work for the broader audience that Python could attract. Moreover, a book was clearly required if Python was to ever grow beyond its then-small user base; especially in the 90s, a book could add legitimacy in ways that nothing else could.

Historic anecdote: in 1995 my publisher initially rejected the idea of doing a Python book, but warmed after I lobbied them about it over a period of months. The result was the first Programming Python; which spawned Learning Python and the Pocket Reference; which all helped spawn the massive domain that Python is today. Sometimes it pays to be stubborn.

What have you learned from writing about Python?

Too much to impart here, of course. But I will pass on that writing and teaching are not innate talents; like other skills, it takes lots of practice to hone either. My advice to anyone interested in either pursuit is to get out there and do it; the more criticism you receive, and the more live-and-in-person that criticism is, the better your message will become. Beyond that, it’s largely a matter of saying what you mean, and rewriting till you can rewrite no more.

Thanks so much for doing the interview and sharing your insights!

May 02, 2016 12:30 PM

Automating OSINT

Expanding Skype Forensics with OSINT: Email Accounts

I will be the first to tell you that I know little about forensics compared to most law enforcement or private forensic examiners. One thing that I always found amazing was looking at the result of a forensic acquisition and seeing all of that magical data flowing out from it. Email addresses, phone numbers, usernames, social media, images, the list goes on and on. This always struck me as a place where OSINT could be applied as a follow-on to try to expand your knowledge of the acquired device and the owner. So I reached out to a few forensics gurus (thanks Shafik Punja and Jacques Boucher) to ask them where there is a good source of forensic information on a hard drive that I could use to begin querying online services for additional information. Jacques was kind enough to point out that the Skype database is in SQLite format and was a veritable treasure trove of information.

So what we are going to do in this post is twofold: we will build a Python script that can extract emails from any SQLite database and we will utilize the Full Contact API to perform lookups on the email accounts that we find. The final output of our adventure will be a spreadsheet that contains all of the social media profiles that were discovered. Let’s get started.

Python and SQLite

SQLite is a file based database system that has many of the great features that most server based database systems have, all compacted down into a nice tiny little file. The wonderful thing is that Python has built-in support for SQLite so this will make it very easy for us to interface to any SQLite file that we choose. This script will be designed such that no matter what database is passed in, we will systematically walk through each discovered table, and search through each column looking for email addresses, which should allow you to move beyond just Skype for exploration.

SQLite has a table that describes the schema for the database called SQLITE_MASTER. We will execute a query against this table to pull out all of the tables that are contained in the database. From there we will walk through each table and do a SQL SELECT statement to select all records from each table. When results come back, they are broken down into columns and we will walk through each column one by one and attempt to extract email addresses. Let’s get started by creating a new Python script named sqlite_parser.py and entering the following code (download full source here):

import argparse
import re
import sqlite3

ap = argparse.ArgumentParser()
ap.add_argument("-d","--database",    required=True,help="Path to the SQLite database you wish to analyze.")
args = vars(ap.parse_args())

match_list  = []
regex_match = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')

# connect to the database
db     = sqlite3.connect(args['database'])
cursor = db.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")

tables = cursor.fetchall()

for table in tables:
    
    print "[*] Scanning table...%s" % table
    
    # now do a broad select for all records
    cursor.execute("SELECT * FROM %s" % table[0])

    rows = cursor.fetchall()
    
    for row in rows:
        
        for column in row:
            
            try:
                matches = regex_match.findall(column)
            except:
                continue
            
            for match in matches:
                
                if match not in match_list:
                    
                    match_list.append(match)
cursor.close()
db.close()
            
print "[*] Discovered %d matches." % len(match_list)

for match in match_list:
    print match

Lines 1-3: we import the necessary libraries we are going to use to parse the database.
Lines 8-10: here we setup a command line argument parser to handle passing in the location of the database file we wish to parse.
Line 9: we initialize an empty list that will hold all of the matches that we extract from the database.
Line 10: here we are setting up a regular expression to match email addresses. This is not an exhaustive pattern but it does the job. The great thing is that you could modify this for IP addresses (which we are going to do in Part 2) or any other pattern that you wish to extract from a SQLite database.

Now let’s get hooked up to the SQLite database and find all of the tables that are available:

import argparse
import re
import sqlite3

ap = argparse.ArgumentParser()
ap.add_argument("-d","--database",    required=True,help="Path to the SQLite database you wish to analyze.")
args = vars(ap.parse_args())

match_list  = []
regex_match = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')

# connect to the database
db     = sqlite3.connect(args['database'])
cursor = db.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")

tables = cursor.fetchall()

for table in tables:
    
    print "[*] Scanning table...%s" % table
    
    # now do a broad select for all records
    cursor.execute("SELECT * FROM %s" % table[0])

    rows = cursor.fetchall()
    
    for row in rows:
        
        for column in row:
            
            try:
                matches = regex_match.findall(column)
            except:
                continue
            
            for match in matches:
                
                if match not in match_list:
                    
                    match_list.append(match)
cursor.close()
db.close()
            
print "[*] Discovered %d matches." % len(match_list)

for match in match_list:
    print match

Let’s break this down a bit:

Line 13: we connect to the SQLite database file by passing in the path to the database file. The path comes from our command line argument that we passed in.
Line 14: once we have connected, we create a SQLite cursor object that we will use to issue SQL statements to the database.
Line 16: we execute our query to extract all tables from the sqlite_master table which will give us the list of tables we can then walk through.
Line 18: we use the fetchall() function to retrieve the query results from our previous query on line 16. This will return a list that we can then loop over and perform subsequent queries against.

Now we need to loop over each table, query it for all of its data, and then walk through each column, applying our regular expression to try to extract email addresses. This will be a longer chunk of code, so let’s get to it:

import argparse
import re
import sqlite3

ap = argparse.ArgumentParser()
ap.add_argument("-d","--database",    required=True,help="Path to the SQLite database you wish to analyze.")
args = vars(ap.parse_args())

match_list  = []
regex_match = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')

# connect to the database
db     = sqlite3.connect(args['database'])
cursor = db.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")

tables = cursor.fetchall()

for table in tables:
    
    print "[*] Scanning table...%s" % table
    
    # now do a broad select for all records
    cursor.execute("SELECT * FROM %s" % table[0])

    rows = cursor.fetchall()
    
    for row in rows:
        
        for column in row:
            
            try:
                matches = regex_match.findall(column)
            except:
                continue
            
            for match in matches:
                
                if match not in match_list:
                    
                    match_list.append(match)
cursor.close()
db.close()
            
print "[*] Discovered %d matches." % len(match_list)

for match in match_list:
    print match

Lines 20-27: we begin looping (20) over the list of tables, print out a helpful message (22) and then we do a SELECT statement on the table (25) to retrieve all of the records from the table. We use the fetchall() function (27) again to retrieve all records from our SELECT query.
Lines 29-31: we start looping over each row in the result (29) and then for each row we have we start looping over each column (31).
Lines 33-36: we attempt to apply our email regular expression (34) and if we are unsuccessful (36) we head back to the top of the loop on line 33 to start looking in the next column.
Lines 38-42: we walk through the list of matches from our regular expression (38) and if we encounter a new match that we haven’t already encountered (40) we add the match to our global list of matches (42) before carrying on.

Now let’s put the final touches on the script to close our database resources and print out all of the matches we have encountered. Add the following lines of code to your script:

import argparse
import re
import sqlite3

ap = argparse.ArgumentParser()
ap.add_argument("-d","--database",    required=True,help="Path to the SQLite database you wish to analyze.")
args = vars(ap.parse_args())

match_list  = []
regex_match = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')

# connect to the database
db     = sqlite3.connect(args['database'])
cursor = db.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")

tables = cursor.fetchall()

for table in tables:
    
    print "[*] Scanning table...%s" % table
    
    # now do a broad select for all records
    cursor.execute("SELECT * FROM %s" % table[0])

    rows = cursor.fetchall()
    
    for row in rows:
        
        for column in row:
            
            try:
                matches = regex_match.findall(column)
            except:
                continue
            
            for match in matches:
                
                if match not in match_list:
                    
                    match_list.append(match)
cursor.close()
db.close()
            
print "[*] Discovered %d matches." % len(match_list)

for match in match_list:
    print match

Lines 43-44: we close the database cursor (43) and the connection to the SQLite file (44).
Lines 48-49: we loop through all of our matches and print out each one.

Well done! Now it is tip to give it a spin to make sure it works. As was described by Jacques to me, you can locate the Skype SQLite files like so:

For Mac OSX:

/Users/<your_mac_username>/Library/Application\ Support/Skype/<your_skype_username>/main.db

For Windows:

%appdata%\Skype\main.db

Let it Rip!

So let’s give this little unit a run and see some output. In this case I have the main.db file in the same file location as my Python script:

Justins-MacBook-Pro:Desktop justin$ python sqlite_osint.py -d main.db

[*] Scanning table…DbMeta
[*] Scanning table…AppSchemaVersion
[*] Scanning table…Contacts
[*] Scanning table…LegacyMessages
[*] Scanning table…Calls
[*] Scanning table…Accounts
[*] Scanning table…Transfers

…more tables scanned here

[*] Discovered 68 matches.

[email protected]

… 67 other email addresses here.

If you get output like the above then you know everything is working! Now let’s integrate the Full Contact API so we can do some OSINT on the email addresses that we have extracted.

Integrating FullContact

We can now extract email accounts from any SQLite we please, now of course to leverage this information we want to try to attribute those email accounts to social media or some other online presence that we can utilize to gain additional intelligence. We will utilize FullContact (a data aggregator) to do some additional lookups on the email accounts that we discover. As we discover accounts we will add them to a CSV file so that we can explore the data using Excel or Google Fusion Tables, this will also allow us to easily bring this data into other tools. The first step is to sign up for a FullContact API key. Once you have done that, save your sqlite_parser.py script as sqlite_fullcontact.py (download from here) and let’s make some modifications. Right after your import sqlite3 line add the following:

import argparse
import re
import sqlite3
import requests
import time
import csv

full_contact_api_key = "YOURAPIKEY"

Perfect, we are just adding some additional modules and a variable to hold your FullContact API key. If you don’t have the requests library you can use pip to install it (view videos here for help). Now we are going to start adding code to the bottom of your script at line 53. Punch out the following:

import argparse
import re
import sqlite3
import requests
import time
import csv

full_contact_api_key = "YOURAPIKEY"

ap = argparse.ArgumentParser()
ap.add_argument("-d","--database",    required=True,help="Path to the SQLite database you wish to analyze.")
args = vars(ap.parse_args())

match_list  = []
regex_match = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')

# connect to the database
db     = sqlite3.connect(args['database'])
cursor = db.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")

tables = cursor.fetchall()

for table in tables:
    
    print "[*] Scanning table...%s" % table
    
    # now do a broad select for all records
    cursor.execute("SELECT * FROM %s" % table[0])

    rows = cursor.fetchall()
    
    for row in rows:
        
        for column in row:
            
            try:
                matches = regex_match.findall(column)
            except:
                continue
            
            for match in matches:
                
                if match not in match_list:
                    
                    match_list.append(match)
cursor.close()
db.close()
            
print "[*] Discovered %d matches." % len(match_list)

# setup for csv writing
fd         = open("%s-social-media.csv" % args['database'],"wb")
fieldnames = ["Email Address","Network","User ID","Username","Profile URL"]
writer     = csv.DictWriter(fd,fieldnames=fieldnames)
writer.writeheader()

while match_list:
    
    # build the request up for Full Contact
    headers = {}
    headers['X-FullContact-APIKey'] = full_contact_api_key
    
    match   = match_list.pop()
    
    print "[*] Trying %s" % match
    url = "https://api.fullcontact.com/v2/person.json?email=%s" % match
    
    response = requests.get(url,headers=headers)
    
    time.sleep(2)
    
    if response.status_code == 200:
        
        contact_object = response.json()       
        
        if contact_object.has_key('socialProfiles'):
            
            for profile in contact_object['socialProfiles']:
                
                record = {}
                record['Email Address'] = match
                record['Network']       = profile.get("type","N/A")
                record['User ID']       = profile.get("id","N/A")
                record['Username']      = profile.get("username","N/A")
                record['Profile URL']   = profile.get("url","N/A")
                
                writer.writerow(record)
                
                # print some output to the screen
                print "Network: %s"  % profile.get("type","N/A")
                print "Username: %s" % profile.get("username","N/A")
                print "URL: %s"      % profile.get("url","N/A")
                print "ID: %s"       % profile.get("id","N/A")
                print 
                
    
    elif response.status_code == 202:
        
        print "[*] Sleeping for a bit."
    
        # push this item back onto the list and sleep
        match_list.append(match)
        time.sleep(30)        

fd.close()

Line 54: we open a file handle for our CSV file where we will store the retrieved information.
Line 55: we setup a list of field names which will become the columns in our spreadsheet.
Line 56: we use the csv module’s DictWriter class, passing in the file handle we created on line 54 and the list of field names created on line 55. The DictWriter class will allow us to write rows in our spreadsheet by passing in a dictionary.
Line 57: we write out the top row (the header) of our spreadsheet that will give us column names like any good spreadsheet should have.

Next we need to walk through the list of matches from our SQLite parsing, and pass each email address of to the FullContact API to try to retrieve any results they may have. Let’s implement the code to do this now:

import argparse
import re
import sqlite3
import requests
import time
import csv

full_contact_api_key = "YOURAPIKEY"

ap = argparse.ArgumentParser()
ap.add_argument("-d","--database",    required=True,help="Path to the SQLite database you wish to analyze.")
args = vars(ap.parse_args())

match_list  = []
regex_match = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')

# connect to the database
db     = sqlite3.connect(args['database'])
cursor = db.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")

tables = cursor.fetchall()

for table in tables:
    
    print "[*] Scanning table...%s" % table
    
    # now do a broad select for all records
    cursor.execute("SELECT * FROM %s" % table[0])

    rows = cursor.fetchall()
    
    for row in rows:
        
        for column in row:
            
            try:
                matches = regex_match.findall(column)
            except:
                continue
            
            for match in matches:
                
                if match not in match_list:
                    
                    match_list.append(match)
cursor.close()
db.close()
            
print "[*] Discovered %d matches." % len(match_list)

# setup for csv writing
fd         = open("%s-social-media.csv" % args['database'],"wb")
fieldnames = ["Email Address","Network","User ID","Username","Profile URL"]
writer     = csv.DictWriter(fd,fieldnames=fieldnames)
writer.writeheader()

while match_list:
    
    # build the request up for Full Contact
    headers = {}
    headers['X-FullContact-APIKey'] = full_contact_api_key
    
    match   = match_list.pop()
    
    print "[*] Trying %s" % match
    url = "https://api.fullcontact.com/v2/person.json?email=%s" % match
    
    response = requests.get(url,headers=headers)
    
    time.sleep(2)
    
    if response.status_code == 200:
        
        contact_object = response.json()       
        
        if contact_object.has_key('socialProfiles'):
            
            for profile in contact_object['socialProfiles']:
                
                record = {}
                record['Email Address'] = match
                record['Network']       = profile.get("type","N/A")
                record['User ID']       = profile.get("id","N/A")
                record['Username']      = profile.get("username","N/A")
                record['Profile URL']   = profile.get("url","N/A")
                
                writer.writerow(record)
                
                # print some output to the screen
                print "Network: %s"  % profile.get("type","N/A")
                print "Username: %s" % profile.get("username","N/A")
                print "URL: %s"      % profile.get("url","N/A")
                print "ID: %s"       % profile.get("id","N/A")
                print 
                
    
    elif response.status_code == 202:
        
        print "[*] Sleeping for a bit."
    
        # push this item back onto the list and sleep
        match_list.append(match)
        time.sleep(30)        

fd.close()

Line 59: this block of code will continue executing as long as there are email matches to process.
Lines 62-63: in order to authenticate to the FullContact API we have to pass in the X-FullContact-APIKey HTTP header. We do this by setting up a headers dictionary (62) and then set the required header and the value of our API key (63).
Lines 65-68: we grab an email address from our list (65) print it out (67) and then we build a URL to pass in the email address to the FullContact API endpoint (68).
Lines 70-72: we send off the request to the FullContact API, passing in our HTTP headers in the headers variable (70) and then sleep for two seconds (72) to obey the rate-limiting imposed by the FullContact servers.

We have our request sent off and now it is time to test the results, and if there are good matches we need to store them in our CSV file. Let’s implement the code to do so:

import argparse
import re
import sqlite3
import requests
import time
import csv

full_contact_api_key = "YOURAPIKEY"

ap = argparse.ArgumentParser()
ap.add_argument("-d","--database",    required=True,help="Path to the SQLite database you wish to analyze.")
args = vars(ap.parse_args())

match_list  = []
regex_match = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')

# connect to the database
db     = sqlite3.connect(args['database'])
cursor = db.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")

tables = cursor.fetchall()

for table in tables:
    
    print "[*] Scanning table...%s" % table
    
    # now do a broad select for all records
    cursor.execute("SELECT * FROM %s" % table[0])

    rows = cursor.fetchall()
    
    for row in rows:
        
        for column in row:
            
            try:
                matches = regex_match.findall(column)
            except:
                continue
            
            for match in matches:
                
                if match not in match_list:
                    
                    match_list.append(match)
cursor.close()
db.close()
            
print "[*] Discovered %d matches." % len(match_list)

# setup for csv writing
fd         = open("%s-social-media.csv" % args['database'],"wb")
fieldnames = ["Email Address","Network","User ID","Username","Profile URL"]
writer     = csv.DictWriter(fd,fieldnames=fieldnames)
writer.writeheader()

while match_list:
    
    # build the request up for Full Contact
    headers = {}
    headers['X-FullContact-APIKey'] = full_contact_api_key
    
    match   = match_list.pop()
    
    print "[*] Trying %s" % match
    url = "https://api.fullcontact.com/v2/person.json?email=%s" % match
    
    response = requests.get(url,headers=headers)
    
    time.sleep(2)
    
    if response.status_code == 200:
        
        contact_object = response.json()       
        
        if contact_object.has_key('socialProfiles'):
            
            for profile in contact_object['socialProfiles']:
                
                record = {}
                record['Email Address'] = match
                record['Network']       = profile.get("type","N/A")
                record['User ID']       = profile.get("id","N/A")
                record['Username']      = profile.get("username","N/A")
                record['Profile URL']   = profile.get("url","N/A")
                
                writer.writerow(record)
                
                # print some output to the screen
                print "Network: %s"  % profile.get("type","N/A")
                print "Username: %s" % profile.get("username","N/A")
                print "URL: %s"      % profile.get("url","N/A")
                print "ID: %s"       % profile.get("id","N/A")
                print 
                
    
    elif response.status_code == 202:
        
        print "[*] Sleeping for a bit."
    
        # push this item back onto the list and sleep
        match_list.append(match)
        time.sleep(30)        

fd.close()

Line 74: if we receive a good response back from the FullContact API we are ready to test for matches.
Line 76: we let the requests library parse the JSON response from the server and store the result in the contact_object variable.
Line 78: we test the contact_object (which is now a dictionary) for the socialProfiles key which will indicate to us that there is a hit for a social media profile.
Line 80: a single email account can have multiple social media profiles, and they are stored in list in the socialProfiles key. We begin walking through each social media profile and store each one in the profile variable.
Lines 82-87: we initialize an empty dictionary called record (82) and begin populating it with all of the data retrieved from the FullContact API. The get() function will attempt to retrieve each of the values you see (type, id, username, url) and if there is no value present it will automatically return “N/A” as shown.
Line 89: we write the record dictionary to our CSV file as a new row.
Lines 91-96: we simply print out the information that we have found so we can monitor the output as the script is running.

We now need to implement a separate check to see if the FullContact API is wanting us to wait before retrieving results. This can happen randomly, and to be honest I am not sure why. We are going to test for the HTTP response code 202 instead of 200 and then re-add the email address that was tested back to our matches list to make sure that we aren’t dropping email addresses. Be mindful of indentation here, it should be indented as far as our status code 200 check:

import argparse
import re
import sqlite3
import requests
import time
import csv

full_contact_api_key = "YOURAPIKEY"

ap = argparse.ArgumentParser()
ap.add_argument("-d","--database",    required=True,help="Path to the SQLite database you wish to analyze.")
args = vars(ap.parse_args())

match_list  = []
regex_match = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')

# connect to the database
db     = sqlite3.connect(args['database'])
cursor = db.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")

tables = cursor.fetchall()

for table in tables:
    
    print "[*] Scanning table...%s" % table
    
    # now do a broad select for all records
    cursor.execute("SELECT * FROM %s" % table[0])

    rows = cursor.fetchall()
    
    for row in rows:
        
        for column in row:
            
            try:
                matches = regex_match.findall(column)
            except:
                continue
            
            for match in matches:
                
                if match not in match_list:
                    
                    match_list.append(match)
cursor.close()
db.close()
            
print "[*] Discovered %d matches." % len(match_list)

# setup for csv writing
fd         = open("%s-social-media.csv" % args['database'],"wb")
fieldnames = ["Email Address","Network","User ID","Username","Profile URL"]
writer     = csv.DictWriter(fd,fieldnames=fieldnames)
writer.writeheader()

while match_list:
    
    # build the request up for Full Contact
    headers = {}
    headers['X-FullContact-APIKey'] = full_contact_api_key
    
    match   = match_list.pop()
    
    print "[*] Trying %s" % match
    url = "https://api.fullcontact.com/v2/person.json?email=%s" % match
    
    response = requests.get(url,headers=headers)
    
    time.sleep(2)
    
    if response.status_code == 200:
        
        contact_object = response.json()       
        
        if contact_object.has_key('socialProfiles'):
            
            for profile in contact_object['socialProfiles']:
                
                record = {}
                record['Email Address'] = match
                record['Network']       = profile.get("type","N/A")
                record['User ID']       = profile.get("id","N/A")
                record['Username']      = profile.get("username","N/A")
                record['Profile URL']   = profile.get("url","N/A")
                
                writer.writerow(record)
                
                # print some output to the screen
                print "Network: %s"  % profile.get("type","N/A")
                print "Username: %s" % profile.get("username","N/A")
                print "URL: %s"      % profile.get("url","N/A")
                print "ID: %s"       % profile.get("id","N/A")
                print 
                
    
    elif response.status_code == 202:
        
        print "[*] Sleeping for a bit."
    
        # push this item back onto the list and sleep
        match_list.append(match)
        time.sleep(30)        

fd.close()

Just like that! We test for the 202 code, and if we encounter it we push the email address back into our list of email addresses to check and then we sleep for 30 seconds. The last line of code simply closes the file handle associated to our CSV file. Now it is time to give it a run!

Let It Rip!

You are going to run this script in the exact same way that you ran the previous one but the output should be different, assuming you get good results from the FullContact API:

Justins-MacBook-Pro:Desktop justin$ python sqlite_osint_fullcontact.py -d main.db

[*] Scanning table…DbMeta
[*] Scanning table…AppSchemaVersion
[*] Scanning table…Contacts

[*] Trying [email protected]

Network: twitter
Username: jms_dot_py
URL: https://twitter.com/jms_dot_py
ID: 817668451

Of course there were many more hits than just my email address but you can see how this can greatly expand your investigations once you have a SQLite database in your hands. In our next instalment we’ll take a look at how to build a map out of the IP addresses discovered to demonstrate how flexible our SQLite parsing is and how easy it is to adapt our scripts. Do you have a cool SQLite database that yields a pile of information like Skype does? Shoot me an email and let me know, because I would love to test it out! Thanks for reading, I’m looking forward to hearing from you.

May 02, 2016 12:00 PM

Caktus Consulting Group

ES6 For Django Lovers

ES6 for Django Lovers!

The Django community is not one to fall to bitrot. Django supports every new release of Python at an impressive pace. Active Django websites are commonly updated to new releases quickly and we take pride in providing stable, predictable upgrade paths.

We should be as adamant about keeping up that pace with our frontends as we are with all the support Django and Python put into the backend. I think I can make the case that ES6 is both a part of that natural forward pace for us, and help you get started upgrading the frontend half of your projects today.

The Case for ES6

As a Django developer and likely someone who prefers command lines, databases, and backends you might not be convinced that ES6 and other Javascript language changes matter much.

If you enjoy the concise expressiveness of Python, then ES6's improvements over Javascript should matter a lot to you. If you appreciate the organization and structure Django's common layouts for projects and applications provides, then ES6's module and import system is something you'll want to take advantage of. If you benefit from the wide variety of third-party packages the Python Package index makes available to you just a pip install away, then you should be reaching out to the rich ecosystem of packages NPM has available for frontend code, as well.

For all the reasons you love Python and Django, you should love ES6, too!

Well Structured Code for Your Whole Project

In any Python project, you take advantage of modules and packages to break up a larger body of code into sensible pieces. It makes your project easier to understand and maintain, both for yourself and other developers trying to find their way around a new codebase.

If you're like many Python web developers, the lack of structure between your clean, organized Python code and your messy, spaghetti Javascript code is something that bothers you. ES6 introduces a native module and import system, with a lot of similarities to Python's own modules.

import React from 'react';

import Dispatcher from './dispatcher.jsx';
import NoteStore from './store.jsx';
import Actions from './actions.jsx';
import {Note, NoteEntry} from './components.jsx';
import AutoComponent from './utils.jsx'

We don't benefit only from organizing our own code, of course. We derive an untold value from a huge and growing collection of third-party libraries available in Python and often specifically for Django. Django itself is distributed in concise releases through PyPI and available to your project thanks to the well-organized structure and the distribution service provided by PyPI.

Now you can take advantage of the same thing on the frontend. If you prefer to trust a stable package distribution for Django and other dependencies of your project, then it is a safe bet to guess that you are frustrated when you have to "install" a Javascript library by just unzipping it and committing the whole thing into your repository. Our Javascript code can feel unmanaged and fragile by comparison to the rest of our projects.

NPM has grown into the de facto home of Javascript libraries and grows at an incredible pace. Consider it a PyPI for your frontend code. With tools like Browserify and Webpack, you can wrap all the NPM installed dependencies for your project, along with your own organized tree of modules, into a single bundle to ship with your pages. These work in combination with ES6 modules to give you the scaffolding of modules and package management to organize your code better.

A Higher Baseline

This new pipeline allows us to take advantage of the language changes in ES6. It exposes the wealth of packages available through NPM. We hope it will raise the standard of quality within our front-end code.

This raised bar puts us in a better position to continue pushing our setup forward.

How Caktus Integrates ES6 With Django

Combining a Gulp-based pipeline for frontend assets with Django's runserver development web server turned out to be straightforward when we inverted the usual setup. Instead of teaching Django to trigger the asset pipeline, we embedded Django into our default gulp task.

Now, we set up livereload, which reloads the page when CSS or JS has been changed. We build our styles and scripts, transforming our Less and ES6 into CSS and Javascript. The task will launch Django's own runserver for you, passing along --port and --host parameters. The rebuild() task delegated to below will continue to monitor all our frontend source files for changes to automatically rebuild them when necessary.

// Starts our development workflow
gulp.task('default', function (cb) {
  livereload.listen();

  rebuild({
    development: true,
  });

  console.log("Starting Django runserver http://"+argv.address+":"+argv.port+"/");
  var args = ["manage.py", "runserver", argv.address+":"+argv.port];
  var runserver = spawn("python", args, {
    stdio: "inherit",
  });
  runserver.on('close', function(code) {
    if (code !== 0) {
      console.error('Django runserver exited with error code: ' + code);
    } else {
      console.log('Django runserver exited normally.');
    }
  });
});

Integration with Django's collectstatic for Deployments

Options like Django Compressor make integration with common Django deployment pipelines a breeze, but you may need to consider how to combine ES6 pipelines more carefully. By running our Gulp build task before collectstatic and including the resulting bundled assets — both Less and ES6 — in the collected assets, we can make our existing Gulp builds and Django work together very seamlessly.

References

GulpJS (http://gulpjs.com/)
ES6 Features (http://es6-features.org/)
Django Project Template (https://github.com/caktus/django-project-template), maintained by Caktus

May 02, 2016 12:00 PM

Dan Crosta

Delete Your Dead Code!

A few days ago, Ned Batchelder's post on deleting code made the rounds on HN, even though it was originally written in 2002. Here I want to echo a few of Ned's points, and take a stronger stance than he did: delete code as soon as you know you don't need it any more, no questions asked. I'll also offer some tips from the trenches for how to identify candidate dead code.

This is the second in an ongoing series on eating your vegetables in software engineering, on good, healthy practices for a happy and successful codebase. I don't (yet) know how long the series will be, so please stay tuned!

What Is Dead May Never Die

This heading isn't just an oh-so-clever and timely pop culture reference. Dead code, that is, code that can't possibly be executed by your program, is a real hindrance to the maintainability of your codebase. How many times have you gone to add what seemed like a simple feature or improvement, only to be stymied by the complexity of the code you have to work around and within? How much nicer would your life be if the practice of adding a feature or fixing a bug was as easy as you actually thought it would be during sprint planning?

Each time you want to make a change, you must consider how it interacts with each of the existing features, quirks, known bugs, and limitations of all the code that surrounds it. By having less code surrounding the feature you want to add, there's less to consider and less that can go wrong. Dead code is especially pernicious, because it looks like you need to consider interactions with it, but, since it's dead, it's merely a distraction. It can't possibly benefit you since it's never called.

The fact that dead code might never actually die is an existential threat to your ability to work with a given codebase. In the limit, if code that isn't called is never culled, the size of your application will grow forever. Before you know it, what might only be a few thousand lines of actual functionality is surrounded by orders of magnitude more code that, by definition, does nothing of value.

It's Got to Go

Ned (Batchelder, not Stark) was a little more nuanced and diplomatic than I'm going to be here:

Let's say you have a great class, and it has many methods. One day you discover that you no longer are calling a particular method. Do you leave it in or take it out?

There's no single answer to the question, because it depends on the class and the method. The answer depends on whether you think the method might be called again in the future.

From http://nedbatchelder.com/text/deleting-code.html.

I say: scorch the earth and leave no code alive. The best code is code that you don't even have.

For those less audacious than I, remember that version control has your back in case you ever need that code again.

That said, I've never experienced a need to add something back that I have previously deleted, at least not in the literal sense of adding back in, line for line, verbatim, a section of code I'd previously deleted.

Of course I'm not talking about things like reverting wrong-headed commits here -- we're all human, and I make as many mistakes as the next person. What I mean is, I've never deleted a feature, shipped to production, then come back weeks, or months later and thought to myself, "boy howdy, that code I wrote a year or more ago was probably pretty good, so let's put it back now." Codebases live and evolve with time, so the old code probably doesn't fit with the new ideas, techniques, frameworks, and styles in use today. I might refer back to the old version for a refresher, especially if it's particularly subtle, but I've never brought code back in, wholesale.

So, do yourself -- and your team -- a favor, and delete dead code as soon as you notice it.

How Did We Get Here?

Ned's post goes into great detail on how and why dead code happens -- perhaps the person making a change didn't think the code would be gone forever, and so commented it out or conditionally compiled it. Perhaps the person making a change didn't know enough to know that the code was actually dead (about which more later).

I'll add another hypothesis to the list: we might all just be lazy. It's definitely easier not to do something (i.e. to leave the code as-is) than to do something (delete it).

Laziness is, after all, one of the three great virtues of a programmer. But the Laziness that Larry Wall was talking about isn't this kind, but another kind: "The quality that makes you go to great effort to reduce overall energy expenditure." Viewed this way, deleting dead code is an act of capital-L Laziness -- doing something that's easy now to prevent yourself from having to do something hard later. We could all stand to develop more of this kind of Laziness, what I like to think of as "disciplined laziness," in our day-to-day habits.

How Do We Get Out Of Here?

I spend most of my time programming in Python, where, unfortunately, IDEs can't usually correctly analyze a complete codebase and identify never-called code automatically. But, with a combination of discipline and some run-time tooling, we can attack this problem from two sides.

For simple cases, a better sense of situational awareness can help identify and remove dead code while you're making changes. Imagine you're working on a particular function, and you notice that a branch of an if/else phrase can't be executed based on the valid values of the surrounding code. I call this "dead code in the small," and this is quite easy to reason about and remove, but it does require a bit more effort than one might ordinarily expend.

Until you develop the habit of noticing this during the course of your ordinary programming routine, you can add a step to your pre-commit checklist: review the code around your changes for any now-dead code. This could happen just before you submit the code to your co-workers for review (you do do code review, right?) so that they don't have to repeat that process while reading through your changes.

Another kind of dead code happens when you remove the last usage of a class or function from within the code you're changing, without realizing that it's the last place that uses it. This is "dead code in the large," and is harder to discover in the course of ordinary programming unless you're lucky enough to have eidetic memory or know the codebase like the back of your hand.

This is where run-time tooling can help us out. At Magnetic, we're using Ned's (yes, the same Ned) coverage.py package to help inform our decisions about dead code. Ordinarily coverage is used during testing to ensure that your test cases appropriately exercise the code under test, but we also use it within our code "running as normal" to get insight into what is or isn't used:

import coverage

cov = coverage.Coverage(
    data_file="/path/to/my_program.coverage",
    auto_data=True,
    cover_pylib=False,
    branch=True,
    source=["/path/to/my/program"],
)
cov.start()

# ... do some stuff ...

cov.stop()
cov.save()

This sets up a Coverage object with a few options that make the report more usable. First, we tell coverage where to save its data (we'll use that later to produce a nice HTML report of what is and isn't used), and ask it to automatically load and append to that file with auto_data=True. Next we ask it not to bother calculating coverage over the standard library or in installed packages -- that's not our code, so we'd expect that a lot of what's in there might not be used by us. It's not dead code that we need to maintain, so we can safely ignore it. We ask it to compute branch coverage (whether the true and false conditions of each if statement are hit). And finally, we point it out our sources, so that it can link its knowledge of what is or isn't called back to the source code for report computation.

After our program runs, we can compute the HTML coverage report like:

$ COVERAGE_FILE=/path/to/my_program.coverage coverage html -d /path/to/output

Which generates a report like:

Sample HTML report from coverage.py

(A complete example coverage HTML coverage report is available as part of the coverage.py docs.)

The lines highlighted in red are lines that were never hit during the recorded execution of the program -- these are candidate lines (and, by extension, methods) for dead code elimination.

I'll leave you with three warnings about using this approach to finding and removing dead code:

Be careful when reviewing the results of a coverage run -- the fact that a line or function wasn't executed during a single run of the program doesn't mean they're necessarily dead or unreachable in general. You must still check the codebase to determine whether they're completely dead in your application.
Computing coverage means your program needs to do more work, so it will become slower when run in this mode. I wouldn't recommend running this all the time in production, but in a staging environment or in targeted scenarios you'll probably be OK. As always, if performance is an important concern, you should definitely measure what impact coverage has before you run it.
Finally, dont trust code coverage reports from testing runs to find dead code. Some code might be dead, save for the tests that excercise it; and some code might be alive, but untested!

Parting Thoughts

To you, dear reader, I must apologize. I left out an important part of Ned's blog post when I quoted him earlier. He says:

There's no single answer to the question, because it depends on the class and the method. [...] A coarse answer could be: if the class is part of the framework, then leave it, if it is part of the application, then remove it.

If you're an author of a library or framework, rather than an application, then the question of dead code becomes in some ways harder and in other ways easier. In essence, you can't ever remove a part of your public API (except during major version bumps). Essentially, all of your public API is live code, even if you, yourself, don't use it. But behind the public interface, dead code can still happen, and it should still be removed.

Delete your dead code!

May 02, 2016 11:55 AM

Python Software Foundation

We Want You to Run for the 2016 Board of Directors

You don't have to be an expert, or a Python celebrity. If you care about Python and you want to nurture our community and guide our future, we invite you to join the Board.

Nominations are open for the Python Software Foundation's Board of Directors now through the end of May 15. Nominate yourself if you are able and inspired to help the PSF fulfill its mission:

"The mission of the Python Software Foundation is to promote, protect, and advance the Python programming language, and to support and facilitate the growth of a diverse and international community of Python programmers."

If you know someone who would be an excellent director, ask if they would like you nominate them!

What is the job? Directors do the business of the PSF, including:

Appoint PSF officers.
Manage the budget, allocate funds, and award grants.
Raise money and recruit sponsors.
Manage public relations, education, and outreach.
Perform the PSF's legal duties as a non-profit corporation.
Administer the PSF membership program and serve its members.
Protect Python’s intellectual property rights and licenses: logos, trademarks, and open source licenses.

Read "Expectations of Directors" for details.

There are 11 directors, elected annually for a term of one year. Directors are unpaid volunteers. Candidates from anywhere in the world are welcome; members of the Board do not need to be residents or citizens of the United States.

The deadline for nominations is the end of May 15, Anywhere on Earth ("AoE"). As long as it is May 15 somewhere, nominations are open. A simple algorithm is this: make your nominations by 11:59pm on your local clock and you are certain to meet the deadline. Ballots to vote for the board members will be sent May 20, and the election closes May 30.

If you're moved to nominate yourself or someone else, here are the instructions:

How to nominate candidates in the 2016 PSF Board Election.

While you're on that page, check if your membership makes you eligible to actually vote in the election.

For more info, see the PSF home page and the PSF membership FAQ.

May 02, 2016 09:24 AM

Jamal Moir

A Quick Guide to Slicing in Python - Become a Python Ninja

Ninjas in the background with the words 'slicing' in the foreground.

Slicing is an incredibly useful and powerful feature of python. It gives you to ability to manipulate sequences with simple and concise syntax. Slicing has more uses than I can think of or list here, but some of many useful applications include string manipulation and various mathematical uses; when using NumPy you will encounter slicing a lot.

UNDERSTANDING SLICING

Slicing has incredibly simple syntax; after a sequence type variable inside square brackets you define a start point, an end point and a step, separated by colons like so; var[start:end:step]. However it doesn't work the way you might first assume it to. For example if you choose the starting point to be 0 and the end point to be 5, the slice will include indexes 0, 1, 2, 3 and 4. The index 5 will not be included.

One way to remember this is that the start point is inclusive and the end point is exclusive. In other words the slice will be from the start point up to but not including the end point. For the more visual folks another way is to to picture a wall before each value in the sequence. Then the start and end points reference these walls, not the values behind them. The slice will be between these walls, which of course will not include the value behind the end wall because we only go up to the wall and don't pass it.

A diagram showing how to picture sequences when slicing.

The diagram above shows how to picture sequence indexes and the values they reference. Thinking this way will make understanding slicing a lot easier.

A BASIC SLICING EXAMPLE

Throughout this post we will be using the variables defined in the below example but to avoid clutter they will only be included in this first example. The output of each statement will be included in a comment after it starting with '>>>'.

l = [23, 42, 96, 7, 84, 99, 54, 1]
t = (2, 85, 64, 129, 92, 84, 1, 33)
s = 'Hello internet!'
# Basic slicing.
print(l[4:7]) # >>>[84, 99, 54]
print(t[4:7]) # >>>(92, 84, 1)
print(s[4:7]) # >>> 'o i'

The above lines of code take a slice from the 4th element in the sequence up to but not including the 7th element. Note that strings can be sliced as well as lists and tuples.

OMITTING START/END POINTS

When specifying a slice you can omit start and end points. What this does is set the start point to 0 if you omit the start point and set the end point to end of the sequence, including the last item if you omit the end point.

# First five.
print(l[:5]) # >>>[23, 42, 96, 7, 84]
print(t[:5]) # >>>(2, 85, 64, 129, 92)
print(s[:5]) # >>>'>Hello'

The above examples makes a slice of the first 5 elements in the sequence. They omit the start point of the slice and so therefore start from the beginning.

# Last five.
print(l[-5:]) # >>>[7, 84, 99, 54, 1]
print(t[-5:]) # >>>(129, 92, 84, 1, 33)
print(s[-5:]) # >>>'rnet!'

This example does the opposite and makes a slice of the last 5 elements. This time the end point is omitted and so the slice is taken to the end of the sequence.

STEPPING

Another useful feature of slicing is stepping. The step of your slice defines how it 'steps' through the sequence your are slicing. For example a step of 5 would make it jump five elements with each step and a step of -1 would make it step backwards through the sequence.

# Every other.
print(l[::2]) # >>>[23, 96, 84, 54]
print(t[::2]) # >>>(2, 64, 92, 1)
print(s[::2]) # >>>Hloitre!

The above example defines a step of 2 which produces a slice of every other element; every evenly indexed element. If you wanted to take a slice of every oddly indexed element then you would input 1 as the start point.

# Reverse order.
print(l[::-1]) # >>>[1, 54, 99, 84, 7, 96, 42, 23]
print(t[::-1]) # >>>(33, 1, 84, 92, 129, 64, 85, 2)
print(s[::-1]) # >>>!tenretni olleH

This example reverses through the sequence and produces a slice containing every element but in reverse order.

Note that in the two above examples both the start and end point of the slice is omitted. This produces a slice of the whole sequence from start to finish. Normally this would be useless, however when combined with a step it provides a useful way of manipulating a whole sequence.

And that's it for slicing! Python's slicing is an incredibly useful function but also incredibly easy to use once you have understood the way that indexes are used. Hopefully now you are well on your way to ninja level slicing skills in Python.

Make sure you share this post so others can read it as well and don't forget to follow me on Twitter, add me on Google+ and subscribe to this blog by email to make sure you don't miss any posts that are useful to you.

May 02, 2016 01:55 AM

Planet Python

May 05, 2016

O GitHub Pages

Pelican

Travis-CI

May 04, 2016

Let’s jump right into it. Tell us about the speedups in the latest pydevd release.

Was the Cython decision an easy one?

Let’s take a step back and discuss the 2014 decision to merge efforts. How did this conversation get started?

Did the merging/un-forking go as you planned, or were there technical or project challenges?

Fabio, how did it go on your end, having JetBrains sponsor some of your work? Any pushback in your community?

The Python-oriented IDEs likely have some other areas where they face common needs. What do you think are some top issues for Python IDEs in 2016 and beyond?

What’s in the future for pydevd, performance or otherwise?

Flash presentations

Main presentations

Where

When

Schedule

We’d like to thank our sponsors for their continued support:

May 03, 2016

What is PyPy?

Metaclass Hook

Use Cases

tl;dr; Use raven's ThreadedRequestsHTTPTransport transport class to send Google Analytics pageview trackings asynchronously to Google Analytics to collect pageviews that aren't actually browser pages.

Threading or a message queue

About the parameters

"Protocol Version"

"Tracking ID"

"Data Source"

"Client ID"

"User ID"

"Hit type"

"Document location URL"

"Document Host Name" and "Document Path"

"Document Title"

Sample implementation

How to unit test this

How to manually test this

How to really test it

Iterators

Creating your own iterators

Generators

Wrapping Up

May 02, 2016

Major changes for the user

Accounting

Party

Product

Classification

Classification Taxonomic

Purchase

Request

Sale

Stock

Production

Routing

Work

Major changes for the developer

Accounting

Product

Purchase

WebDAV

Python and SQLite

Let it Rip!

Integrating FullContact

Let It Rip!

ES6 for Django Lovers!

The Case for ES6

Well Structured Code for Your Whole Project

A Higher Baseline

How Caktus Integrates ES6 With Django

Integration with Django's collectstatic for Deployments

References

What Is Dead May Never Die

It's Got to Go

How Did We Get Here?

How Do We Get Out Of Here?

Parting Thoughts

UNDERSTANDING SLICING

A BASIC SLICING EXAMPLE

tl;dr; Use raven's `ThreadedRequestsHTTPTransport` transport class to send Google Analytics pageview trackings asynchronously to Google Analytics to collect pageviews that aren't actually browser pages.