Planet Python

Last update: October 10, 2016 07:47 AM

October 10, 2016

Vasudev Ram

PDF cheat sheet: bin/oct/dec/hex conversion (0-255)

Numeral system image attribution

Hi readers,

Here is another in my series of PDF-generation applications built using xtopdf, my Python toolkit for PDF creation from various data formats.

This program generates a PDF cheat sheet for conversion of the numbers 0 to 255 between binary, octal, decimal and hexadecimal numeral systems. It can be useful for programmers, electrical / electronics engineers, scientists or anyone else who has to deal with numbers in those bases.

Here is the program, in file number_systems.py:

from __future__ import print_function
from PDFWriter import PDFWriter
import sys

'''
A program to generate a table of numbers from 
0 to 255, in 4 numbering systems:
    - binary
    - octal
    - decimal
    - hexadecimal
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store on Gumroad: https://gumroad.com/vasudevram
'''

def print_and_write(s, pw):
    print(s)
    pw.writeLine(s)

sa, lsa = sys.argv, len(sys.argv)
if lsa == 1:
    sys.stderr.write("Usage: {} out_filename.pdf\n".format(sa[0]))
    sys.exit(1)

with PDFWriter(sa[1]) as pw:

    pw.setFont('Courier', 12)
    pw.setHeader('*** Number table: 0 to 255 in bases 2, 8, 10, 16 ***')
    pw.setFooter('*** By xtopdf: https://google.com/search?q=xtopdf ***')
    b = "Bin"; o = "Oct"; d = "Dec"; h = "Hex"
    header = "{b:>10}{o:>10}{d:>10}{h:>10}".format(b=b, o=o, d=d, h=h)

    for i in range(256):
        if i % 16 == 0:
            print_and_write(header, pw)
        print_and_write("{b:>10}{o:>10}{d:>10}{h:>10}".format( \
            b=bin(i), o=oct(i), d=str(i), h=hex(i)), pw)

    print_and_write(header, pw)

And here is a screenshot of first page of the PDF output, after running the program
with the command: python number_systems.py BODH-255.pdf

Ans here is a screenshot of the last page:

You can get the cheat sheet from my Gumroad store here: gum.co/BODH-255.

(You can also get email updates about my future products.)

I named the output BODH-255.pdf (from Binary Decimal Octal Hexadecimal 255), instead of BODH-0-255.pdf, because, of course, programmers count from zero, so the 0 can be implicit :)

(The formatting of the output is a little unconventional, due to the use of the header line occurring after every 16 lines in the table, but it is on purpose, because I am experimenting with different formats, to see the effect on readability / usability).

Notice 1) the smooth and systematic progression of the numbers in the vertical columns (how the values change like a car's odometer), and also 2) the relationships between numbers in different columns in the same row, when compared across any rows where one number is a multiple of another, e.g. look at the rows for decimal 32, 64, 128). Both these effects that we see are due to the inherent properties of the numbers with those values and their representation in those number bases. Effect 1) - like a car's odometer - is most noticeable in the Bin(ary) column - because of the way all the 1 bits flip to 0 when the number bumps up to a power of 2 - e.g. when binary 111 (decimal 7) changes to binary 1000 (decimal 8 == 2 ** 3), but the effect can be seen in the other columns as well.

The image at the top of the post is from this Wikipedia page:

Numeral system

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Subscribe to my blog by email

My ActiveState recipes

FlyWheel - Managed WordPress Hosting

Share |

October 10, 2016 01:40 AM

Michał Bultrowicz

Continuous delivery of a Python library with AngularJS commit convention

I got tired of having to manually build and upload my library (Mountepy) to PyPI, so I decided to do what any sane programmer would do - set up automation [1]. But how would my scripts know whether they need to just update the README on PyPI and when to assemble and push a new version of the library? Thanks to the AngularJS commit convention! Oh, and Snap CI will run the whole thing. Why Snap, you ask? See my previous article - Choosing a CI service for your open-source project.

October 10, 2016 12:00 AM

October 09, 2016

Brett Cannon

What to look for in a new TV

I'm kind of an A/V nerd. Now I'm not hardcore enough to have a vinyl collection or have an amp for my TV, but all my headphones cost over $100 and I have a Sonos Playbar so I don't have to put up with crappy TV speakers. What I'm trying to say is that I care about the A/V equipment I use, but not to the extent that money is no object when it comes to my enjoyment of a movie (I'm not that rich and my wife would kill me if I spent that kind of money on electronics). That means I tend to research extensively before making a major A/V purchase since I don't do it very often and I want quality within reason which does not lend itself to impulse buying.

Prior to September 1, 2016, I had a 2011 Vizio television. It was 47", did 1080p, and had passive 3D. When I purchased the TV I was fresh out of UBC having just finished my Ph.D. so it wasn't top-of-the-line, but it was considered very good for the price. I was happy with the picture, but admittedly it wasn't amazing; the screen had almost a matte finish which led to horrible glare. I also rarely used the 3D in the television as 3D Blu-Ray discs always cost extra and so few movies took the time to actually film in 3D to begin with, instead choosing to do it in post-production (basically animated films and TRON: Legacy were all that we ever watched in 3D). And to top it all off, the TV took a while to turn on. I don't know what kind of LCB bulbs were in it, but they took forever to warm up and just annoyed me (yes, very much a first-world problem).

So when UHD came into existence I started to keep an eye on the technology and what television manufacturers were doing to incorporate the technology to entice people like me to upgrade. After two years of watching this space and one of the TVs I was considering having a one-day sale that knock 23% off the price, I ended up buying a 55" Samsung KS8000 yesterday. Since I spent so much time considering this purchase I figured I would try and distill what knowledge I have picked up over the years into a blog post so that when you decide to upgrade to UHD you don't have to start from zero knowledge like I did.

What to care about

First, you don't care about the resolution of the TV. All UHD televisions are 4K, so that's just taken care of for you. It also doesn't generally make a difference in the picture because most people sit too far away from their TV to make the higher resolution matter.

No, the one thing you're going to care about is HDR and everything that comes with it. And of course it can't be a simple thing to measure like size or resolution. Oh no, HDR has a bunch of parts to it that go into the quality of the picture: brightness, colour gamut, and format (yes, there's a format war; HD-DVD/Blu-Ray didn't teach the TV manufacturers a big enough lesson).

Brightness

A key part of HDR is the range of brightness to show what you frequently hear referred to as "inky blacks" and "bright whites". The way you get deep blacks and bright whites is by supporting a huge range of brightness. What you will hear about TVs is what their maximum nit is. Basically you're aiming for 1000 nits or higher for a maximum and as close to 0 as possible for a minimum.

Now of course this isn't as simple as it sounds as there's different technology being used to try and solve this problem.

LCD

Thanks to our computers I'm sure everyone reading this is familiar with LCD displays. But what you might not realize is how they exactly work. In a nutshell there are LED lightbulbs behind your screen that provides white light, and then the LCD pixels turn on and off the red/green/blue parts of themselves to filter out certain colours. So yeah, there are lightbulbs in your screen and how strong they are dictates how bright your TV screen will be.

Now the thing that comes into play here for brightness is how those LED bulbs are oriented in order to get towards that 0 nits for inky blacks. Typical screens are edge-list, which means there is basically a strip of LEDs on the edges of the TV that shine light towards the middle of the screen. This is fine and it's what screens have been working with for a while, but it does mean there's always some light behind the pixels so it's kind of hard to keep it from leaking out a little bit.

This is where local dimming comes in. Some manufacturers are now laying out the LED bulbs in an array/grid behind the screen instead of at the edges. What this allows is for the TV to switch dim an LED bulb if it isn't needed at full strength to illuminate a certain quadrant of the screen (potentially even switching off entirely). Obviously the denser the array, the more local dimming zones and thus the greater chance a picture with some black in it will be able to switch off an LED to truly get a dark black for that part of the screen. As for how often something you're watching is going to allow you to take advantage of such local dimming due to a dark area lining up within a zone is going to vary so this is going to be a personal call as to whether this makes a difference to you.

OLED

If I didn't have a budget and wanted the ultimate solution for getting the best blacks in a picture, I would probably have an OLED TV from LG. What makes these TVs so great is the fact that OLEDs are essentially pixels that provide their own light. What that means is if you want an OLED pixel to be black, you simply switch it off. Or to compare it to local dimming, it's as if every pixel was its own local dimming zone. So if you want truly dark blacks, OLED are the way to go. It also leads to better colours since the intensity of the pixel is consistent compared to an LCD where the brightness is affected by how far the pixel is from the LED bulb that's providing light to the pixel.

But the drawback is that OLED TVs only get so bright. Since each pixel has to generate its own light they can't reach really four-digit nit levels like the LCD TVs can. It's still much brighter than any HD TV, but OLED TVs don't match the maximum brightness of the higher-end LCD TVs.

So currently it's a race to see if LCDs can get their blacks down or if OLEDs can get their brightness up. But from what I have read, in 2016 your best bet is OLED if you can justify the cost to yourself (they are very expensive televisions).

Colour gamut

While having inky blacks and bright whites are nice, not everyone is waiting for Mad Max: Fury Road in black and white. That means you actually care about the rest of the rainbow, which means you care about the colour gamut of the TV for a specific colour space. TVs are currently trying to cover as much of the DCI-P3 colour space as possible right now. Maybe in a few years TVs will fully cover that colour space, at which point they will start worrying about Rec. 2020 (also called BT.2020), but there's still room in covering DCI-P3 before that's something to care about.

In the end colour gamut is probably not going to be something you explicitly shop for, but more of something to be aware of that you will possibly gain by going up in price on your television.

Formats

So you have your brightness and you have your colours, now you have to care about what format all of this information is stored in. Yes my friends, there's a new format war and it's HDR10 versus Dolby Vision. Now if you buy a TV from Vizio or LG then you don't have to care because they are supporting both formats. But if you consider any other manufacturer you need to decide on whether you care about Dolby Vision because everyone supports HDR10 these days but no one supports Dolby Vision at the moment except those two manufacturers.

There is one key reason that HRD10 is supported by all television makers: it's an open specification. By being free it doesn't cut into profits of TVs which obviously every manufacturer likes and is probably why HDR10 is the required HDR standard for Ultra Blu-Ray discs (Dolby Vision is supported on Ultra Blu-Ray, but not required). Dolby Vision, on the other hand, requires licensing fees paid to Dolby. Articles also consistently suggest that Dolby Vision requires new hardware which would also drive up costs of supporting Dolby Vision (best I can come up with is that since Dolby Vision is 12-bit and HDR10 is 10-bit that TVs typically use a separate chip for Dolby Vision processing).

Dolby Vision does currently have two things going for it over HDR10. One is that Dolby Vision is dynamic per frame while HDR10 is static. This is most likely a temporary perk, though, because HDR10 is gaining dynamic support sometime in the future.

Two is that Dolby Vision is part of an end-to-end solution from image capture to projection in the theatres. By making Dolby Vision then also work at home it allows for directors and editors to get the results they want for the cinema and then just pass those results along to your TV without extra work.

All of this is to say that Dolby Vision seems to be the better technology, but the overhead/cost of adding it to a TV along with demand will ultimately dictate whether it catches on. Luckily all TV manufacturers has agreed on the minimum standard of HDR10 so you won't be completely left out if you buy a TV from someone other than LG or Vizio.

Where to go for advice

When it comes time to buy a TV, I recommend Rtings.com for advice. They have a very nice battery of tests they put the TV through and give you nice level of detail on how they reached their scores for each test. They even provide the settings they used for their tests so you can replicate them at home.

You can also read what the Wirecutter is currently recommending. For me, though, I prefer Rtings.com and use the Wirecutter as a confirmation check if their latest TV round-up isn't too out-of-date.

Ultra HD Premium

If you want a very simple way to help choose a television, you can simply consider ones that are listed as Ultra HD Premium. That way you know the TV roughly meets a minimum set of specifications that are reasonable to want if you're spending a lot of money on a TV. The certification is new in 2016 and so there are not a ton of TVs yet that have the certification, but since TV manufacturers like having stamps on their televisions I suspect it will start to become a thing.

One thing to be aware of is that Vizio doesn't like the certification. Basically they have complained that the lack of standards around how to actually measure what the certification requires makes it somewhat of a moot point. That's a totally reasonable criticism and why using the certification as a filter for TVs consider is good, but to not blindly buy a TV just because it has Ultra HD Premium stamp of approval.

Why I chose my TV

Much like when I bought a soundbar, I had some restrictions placed upon me when considering what television I wanted. One, the TV couldn't be any larger than 55" (to prevent the TV from taking over the living room even though we should have a 65" based on the minimum distance people might sit from the TV). This immediately put certain limits on me as some model lines don't start until 65" like the Vizio Reference series. I also wasn't willing to spend CAD 4,000 on an LG, so that eliminated OLED from consideration. I also wanted HDR, so that eliminated an OLED that was only HD.

In the end it was between the 55" Samsung KS8000, 55" Vizio P-series, and the 50" Vizio P-series. The reason for the same Vizio model at different sizes is the fact that they use different display technology; the 50" has a VA display while the 55" has an IPS display. The former will have better colours but the latter has better viewing angles. Unfortunately I couldn't find either model on display here in Vancouver to see what kind of difference it made.

One other knock against the Vizio -- at least at 55" -- was that it wasn't very good in a bright room. That's a problem for us as our living room is north facing with a big window and the TV is perpendicular to those windows, so we have plenty of glare on the screen as the sun goes down. The Samsung, on the other hand, was rated to do better in a glare-heavy room. And thanks to a one-day sale it brought the price of the Samsung to within striking distance of the Vizio. So in the end with the price difference no longer a factor I decided to go with the TV that worked best with glare and maximized the size I could go with.

My only worry with my purchase is if Dolby Vision ends up taking hold and I get left in the cold somehow. But thanks to the HDR10 support being what Ultra Blu-Ray mandates I'm not terribly worried of being shut out entirely from HDR content. There's also hope that I might be able to upgrade my television in the future thanks to it using a Mini One Connect which breaks out the connections from the television. In other TVs the box is much bigger as it contains all of the smarts of the television, allowing future upgrades. There's a chance I will be able to upgrade the box to get Dolby Vision in the future, but that's just a guess at this point that it's even possible, let alone whether Samsung choose to add Do

It's been 48 hours with the TV and both Andrea and I are happy with the purchase; me because the picture is great, Andrea because I will now shut up about television technology in regards to a new TV purchase.

October 09, 2016 08:40 PM

Introducing Which Film

What I'm announcing

Today I'm happy to announce the public unveiling of Which Film! I'll discuss how the site came about and what drives it, but I thought I would first explain what it does: it's a website to help you choose what movie you and your family/friends should watch together. What you do is you go to the site, enter in the Trakt.tv usernames of everyone who wants to watch a film together (so you need at least two people and kept data like your watchlist and ratings on Trakt), and then Which Film cross-references everyone's watchlists and ratings to create a list of movies that people may want to watch together.

The list of movies is ranked based on a simple point scale. If a movie is on someone's watchlist it gets 4 points, movies rated 10 ⭐ get 3 points, 9 ⭐ get 2 points, and 8 ⭐ get 1 point. Everyone who participates contributes points and the movies are sorted from highest score to lowest. The reason for the point values is the assumption that watching a movie most people have not seen is the best, followed by a movies people rate very highly. In the case of ties, the movie seen longest ago (if ever) by anyone in the group is ranked higher than movies seen more recently by someone. That way there's a bigger chance someone will be willing to watch a movie again when everyone else wants to see it for the first time.

None of this is very fancy or revolutionary, but it's useful any time you get together with a group of friends to watch a film and you end up having a hard time choosing to watch. It can help even between spouses as it will identify movies both people want to watch, removing that particular point of contention.

The story behind Which Film

Now normally launching a new website wouldn't cause for any backstory, but this project has been under development for about six years, so there's a bit of history to it.

One fateful night ...

The inspiration for Which Film stemmed from one night when my co-creator Karl, his wife, my wife, and I got together and decided we wanted to watch a movie. This turned out to be quite an ordeal due to disparate tastes among all four of us. Karl and I thought that there had to be a better way to figure out a film we could all happily watch together. It didn't need to necessarily be something none of us had seen (although that was preferred), but it did need to be something that had a chance of making all of us happy if we chose to watch it.

This is when I realized that at least for me I had all of the relevant data to make such a decision on IMDb. I had been keeping my watchlist and ratings up-to-date on the site for years, to the point of amassing a watchlist over of 400 movies. Karl and I realized that had all four of us done that we could have cross-referenced the data and easily have found a film we all liked. Yes, it would require convincing everyone involved to keep track of what movies they wanted to see and rating movies that had seen, but we figured that wasn't an insurmountable problem. And so we decided we should code up a solution since we're both software developers.

You need an API, IMDb

But there was trouble with this project from the beginning. It turns out that while IMDb is happy for you to store your data on their servers, they don't exactly make it easy to get the data out. For instance, when I started looking into this they had two ways of getting to your data in some programmatic way: RSS and CSV files. The problem with RSS is that it was capped at (I believe) 200 entries, so I couldn't use it to access my entire data set. The issue with CSV was that you had to be logged in to download it. And the issue with both approaches was they were constantly broken for for different things simultaneously; when I looked into this last RSS was busted for one kind of list while CSV was broken for another. To top it all off the brokenness wasn't temporary, but lasted for lengths of time measured in months. That obviously doesn't work if you want to rely on the data and there's no official API (and IMDb at least used to aggressively go after anyone who use their name in a project).

Luckily I found Trakt. It has an API, it was accessible on a phone, and it wasn't ugly. The trick, though, was getting my data from IMDb to Trakt. Luckily there was a magical point when CSV exporting on IMDb worked for all of my lists, and so I downloaded the data and hacked together csv2trakt to migrate my data over (there is TraktRater for importing into Trakt as well, but at the time I had issues getting it to run on macOS).

What platform?

With my data moved over, we then had to choose what platform to have Which Film on. We toyed with the idea of doing a mobile app, but I'm an Android user and Karl is on iOS (and the same split for our wives), so that would have meant two apps. That didn't really appeal to either of us so we decided to do a website. We also consciously chose to do a single-page app to avoid maintaining a backend where would have to worry about uptime, potential server costs, etc. It also helps that there's a local company in Vancouver called Surge that does really nice static page hosting with a very reasonable free tier (when they get Let's Encrypt support I'll probably bump up to their paid tier if people actually end up using Which Film).

Choosing a programming language is never easy for me

Since we had found a website we were willing to ask people to use to store data, I had solved my data import problem, and we had decided on doing a website solution, next was what technology stack to use. The simple answer would have been Python, but for me that's somewhat boring since I obviously know Python. To make sure we both maximized our learning from this project we endeavoured to find a programming language neither of us had extensive experience in.

Eventually we settled on Dart. At the time we made this decision I worked at Google which is where Dart comes from, so I knew if I got really stuck with something I had internal resources to lean on. Karl liked the idea of using Dart because his game developer background appreciated the fact that Dart was looking into things like SIMD for performance. I also knew that Dart had been chosen by the ads product division at Google which meant it wasn't going anywhere. That also meant choosing Angular 2 was a simple decision since Google was using Dart with Angular 2 for products and so it would have solid Dart support.

But why six years?!?

As I have said, the site isn't complicated as you can tell from its source code, so you may be wondering why it took us six years before we could finish it. Well, since coming up with this idea I at least finished my Ph.D., moved five times between two countries,and worked for two different employers (if you don't count my Ph.D.). Karl had a similar busy life over the same timespan. And having me spend a majority of those six years in a different timezone didn't help facilitate discussions. At least we had plenty of time to think through various UX and design problems. ☺

If you give Which Film a try do let Karl and/or me know on Twitter (if you just want to see how the website works and you don't have a Trakt account you can use our usernames: brettcannon and kschmidt).

October 09, 2016 08:40 PM

Network protocols, sans I/O

Back in February I started taking a serious look at asynchronous I/O thanks to async/await. One of the things that led to me to looking into this area was when I couldn't find an HTTP/1.1 library that worked with async/await. A little surprised by this, I went looking for an HTTP header parser so that I could do the asynchronous I/O myself and then rely on the HTTP parsing library to at least handle the HTTP parts. But that's when I got even more shocked to find out there wasn't any such thing as an HTTP parsing library in Python!

It turns out that historically people have written libraries dealing with network protocols with the I/O parts baked in. While this has been fine up until now thanks to all I/O in Python being done in a synchronous fashion, this is going to be a problem going forward thanks to async/await and the move towards asynchronous I/O. Basically what this means is that network protocol libraries will need to be rewritten so that they can be used by both synchronous and asynchronous I/O.

If we're going to start rewriting network protocol libraries, then we might as well do it right from the beginning. This means making sure the library will work with any sort of I/O. This doesn't mean simply abstracting out the I/O so that you can plug in I/O code that can conform to your abstraction. No, to work with any sort of I/O the network protocol library needs to operate sans I/O; working directly off of the bytes or text coming off the network is the most flexible. This allows the user of the protocol library to drive the I/O in the way they deem fit instead of how the protocol library thinks it should be done. This provides the ultimate flexibility in terms of how I/O can be used with a network protocol library.

Luckily I wasn't the first to notice the lack of HTTP parsing library. Cory Benfield also noticed this and then did something about it. He created the hyper-h2 project to provide a network protocol library for HTTP/2 that does no I/O of its own. Instead, you feed hyper-h2 bytes off the network and it tells you -- through a state machine -- what needs to happen. This flexibility means that hyper-h2 has examples on how to use the library with curio, asyncio, eventlet, and Twisted (and now there's experimental support in Twisted for HTTP/2 using hyper-h2). Cory also gave a talk at PyCon US 2016 on the very topic of this blog post.

And HTTP/2 isn't the only protocol that has an implementation with no I/O. Nathaniel Smith of NumPy has created h11 which does for HTTP/1.1 what hyper-h2 does for HTTP/2. Once again, h11 does no I/O on its own and instead gets fed bytes which in turn drives a state machine to tell the user what to do.

So why am I writing this blog post? I think it's important to promote this approach to implementing network protocols, to the point that I have created a page at https://sans-io.readthedocs.io/ to act as a reference of libraries that have followed the approach I've outlined here. If you're aware of a network protocol library that performs no I/O (remember this excludes libraries that abstract out I/O), then please send a pull request to the GitHub project to have it added to the list. And if you happen to know a network protocol well, then please consider implementing a library that follows this approach of using no I/O so the community can benefit.

October 09, 2016 08:40 PM

Weekly Python Chat

Classes in Python

Let's chat about classes in Python: what are classes useful for and how are they different from classes in other languages?

October 09, 2016 05:00 PM

PyCon

The PyCon 2017 site has launched — thank you to our Launch Day Sponsors

The new PyCon 2017 web site recently went live, and the conference volunteers have worked hard bring the new site up-to-date with all of the essential details about 2017’s schedule, venue, and hotels. We are very happy with the new logo and banner that Beatrix Bodó crafted to help the conference celebrate its second and final year in beautiful Portland, Oregon!

With the release of the site we have also opened up the proposal forms for Talks, Tutorials, Posters, and Education Summit presentations. Visit our “Speak at PyCon” page to read the details of our Call For Proposals (CFP) and to learn about becoming part of the 2017 conference schedule.

Our launch-day sponsors this year — the organizations who have gone ahead and pledged to support and attend PyCon 2017, helping keep the conference affordable for as wide a range of attendees as possible — are from a broad array of fields that illustrate just how widely Python is used in today’s world.

Two of our Launch Day sponsors this year are supporting the conference at the Platinum level:

Platinum sponsor Anaconda from Contiuum Analytics “is the leading Open Data Science platform powered by Python.” Any of you who, like me, now use Anaconda as your go-to method for installing Python — and all the best data science libraries — will appreciate how crucial the tool has become to our community’s ability to get new users up and running quickly.
Platinum sponsor Microsoft “is proud to support the Python community through sponsored development of Python Tools for Visual Studio, Jupyter, CPython, Azure Machine Learning and organizations such as the PSF and NumFocus.” Millions of programmers around the world find themselves with support for Python already sitting on their desktop because their team or workplace uses Visual Studio.

Our launch-day Gold sponsors range from large Fortune 100 companies to small consultancies providing boutique consulting and programming:

Wingware — An IDE designed specifically for Python.
Stormpath — An identity management API for software teams.
Sentry — Real-time error tracking for your web apps, mobile apps, and games.
Nylas — A new platform for email-powered apps.
Lincoln Loop — A full-service software development agency specializing in Python and Django.
Leadpages — Helps businesses grow by collecting more leads and driving more sales.
Fusionbox — A custom software development agency specializing in Python/Django, ETL, and application security.
Demonware — The online services behind some of the world’s most popular game franchises.
Capital One — A Fortune 100 Company with the levels of innovation and agility that you’d typically find at a start-up.
Caktus Group — Django web application development done right.
American Greetings — A leading creator and manufacturer of innovative social expression products.

And, finally, we have already signed our first Silver sponsor!

O'Reilly — The media company that first put open source on the map for many programmers, providing shelves of books and references to help orient them to a world of operating systems and tools that they had not known existed.

For more details about each sponsor, see the detailed sponsor descriptions on our Sponsors Page and follow the links to their web sites. We look forward to seeing every one of these sponsors in the Expo Hall on Friday and Saturday of the main conference!

Subscribe to our blog here for regular updates as the conference approaches. To get you started, here are the most important dates for the conference through the rest of the year and up to PyCon itself:

2016

October 3 — Call For Proposals (CFP) for Talks, Tutorials, Posters, and the Education Summit
October 14 — Financial Assistance application opens
October 17 — Registration opens
November 30 — Tutorial proposals due

2017

January 3 — Talk, Poster, and Education Summit proposals due
February 1–12 — Talks, Tutorials, Posters, and Education Summit schedules announced
February 15 — Financial Assistance applications due
March 3 — Financial Assistance grants awarded
March 30 — Deadline to respond to Financial Assistance offer

In Portland, Oregon

May 17–18 — Two days of Tutorials
May 19–21 — Three main conference days including Talks, Expo Hall, Job Fair, and Posters
May 22–25 — Four days of Sprints

October 09, 2016 01:49 PM

Kracekumar Ramaraju

RC week 0001

This week, I made considerable progress on the BitTorrent client which I started a week back. The client is in a usable state to download the data from the swarm. The source code is available on GitHub. The project uses Python 3.5 async/await and asyncio. I presented the torrent client in RC Thursday five minute presentation evening slot. Here is the link to the slides.

Here is quick video demo recorded with asciinema.

In the demo, the client downloads a file flag.jpg of 1.3MB. Thanks a lot for Thomas Ballinger for hosting the tracker and the seeder. The tracker and the seeder are boons for developers writing torrent clients.

The downloader has two known major issue

The client has performance issue for a file of size greater than 50 MB.
The client doesn’t support UDP tracker - Clients interact with piratebay tracker only on UDP protocol whereas the other trackers support HTTP endpoint.

The week had long hours of debugging blocking code in asyncio, tweaking the client to support receive data from the deluge client - as soon as the handshake is successful, the client starts sending bitfield, have messages before receiving the interested message.

The next step is to integrate seeding functionality to the torrent client and enhance UDP tracker functionality.

On one of the weekday, I witnessed the momentous incident. I decided to join other two other recursers for lunch at a nearby unvisited outlet to pick up the lunch. The shop was spacious with bar setup and we decided to eat in the building. We ordered three Panner Tika and were munching our meal. The manager/owner/bartender showed up to fill the water, said to another person, “Let me fill water for you.” She greeted him and asked, “How’s your day going? and …”. He enthusiastically replied; filled the glass; enquired about the food and moved on. After few moments, he returned with the exuberant joy; handed over her handful of candies and stated this is a gift for her good manners. Speechless moment! What a revelation and life lesson! The small act of courtesy made the person feel elated and must have made his day. All day and night, we spend time thinking of lighting smiles on family, friends, mates and others. Take a moment to spread joy among the strangers. Later, I recalled a quote,

“How you treat totally irrelevant person defines who you are.”

October 09, 2016 05:09 AM

October 08, 2016

Podcast.init

Episode 78 - Lorena Mesa

Summary

One of the great strengths of the Python community is the diversity of backgrounds that our practitioners come from. This week Lorena Mesa talks about how her focus on political science and civic engagement led her to a career in software engineering and data analysis. In addition to her professional career she founded the Chicago chapter of PyLadies, helps teach women and kids how to program, and was voted onto the board of the PSF.

Brief Introduction

Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
Check out our sponsor Linode for running your awesome new Python apps. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next project
You want to make sure your apps are error-free so give our other sponsor, Rollbar, a look. Rollbar is a service for tracking and aggregating your application errors so that you can find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan.
Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
By leaving a review on iTunes, or Google Play Music it becomes easier for other people to find us.
Join our community! Visit discourse.pythonpodcast.com to help us grow and connect our wonderful audience.
Your host as usual is Tobias Macey
Today we’re interviewing Lorena Mesa about what inspires her in her work as a software engineer and data analyst.

Use the promo code podcastinit20 to get a $20 credit when you sign up!

I’m excited to tell you about a new sponsor of the show, Rollbar.

One of the frustrating things about being a developer, is dealing with errors… (sigh)

Relying on users to report errors
Digging thru log files trying to debug issues
A million alerts flooding your inbox ruining your day…

With Rollbar’s full-stack error monitoring, you get the context, insights and control you need to find and fix bugs faster. It’s easy to get started tracking the errors and exceptions in your stack.You can start tracking production errors and deployments in 8 minutes - or less, and Rollbar works with all major languages and frameworks, including Ruby, Python, Javascript, PHP, Node, iOS, Android and more.You can integrate Rollbar into your existing workflow such as sending error alerts to Slack or Hipchat, or automatically create new issues in Github, JIRA, Pivotal Tracker etc.

We have a special offer for Podcast.__init__ listeners. Go to rollbar.com/podcastinit, signup, and get the Bootstrap Plan free for 90 days. That’s 300,000 errors tracked for free.Loved by developers at awesome companies like Heroku, Twilio, Kayak, Instacart, Zendesk, Twitch and more. Help support Podcast.__init__ and give Rollbar a try a today. Go to rollbar.com/podcastinit

Interview with Lorena Mesa

Introductions
How did you get introduced to Python?
How did your original interests in political science and community outreach lead to your current role as a software engineer?
You dedicate a lot of your time to organizations that help teach programming to women and kids. What are some of the most meaningful experiences that you have been able to facilitate?
Can you talk a bit about your work getting the PyLadies chapter in Chicago off the ground and what the reaction has been like?
Now that you are a member of the board for the PSF, what are your goals in that position?
What is it about software development that made you want to change your career path?
What are some of the most interesting projects that you have worked on, whether for your employer or for fun?
Do you think that the bootcamp you attended did a good job of preparing you for a position in industry?
What is your view on the concept that software development is the modern form of literacy? Do you think that everyone should learn how to program?

Keep In Touch

Twitter

Picks

Tobias
- Zencastr
Lorena
- Weapons of Math Destruction
- What I Talk About When I talk About Running

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Summary One of the great strengths of the Python community is the diversity of backgrounds that our practitioners come from. This week Lorena Mesa talks about how her focus on political science and civic engagement led her to a career in software engineering and data analysis. In addition to her professional career she founded the Chicago chapter of PyLadies, helps teach women and kids how to program, and was voted onto the board of the PSF.Brief IntroductionHello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.Check out our sponsor Linode for running your awesome new Python apps. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next projectYou want to make sure your apps are error-free so give our other sponsor, Rollbar, a look. Rollbar is a service for tracking and aggregating your application errors so that you can find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan.Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.By leaving a review on iTunes, or Google Play Music it becomes easier for other people to find us.Join our community! Visit discourse.pythonpodcast.com to help us grow and connect our wonderful audience.Your host as usual is Tobias MaceyToday we're interviewing Lorena Mesa about what inspires her in her work as a software engineer and data analyst. Use the promo code podcastinit20 to get a $20 credit when you sign up! I’m excited to tell you about a new sponsor of the show, Rollbar. One of the frustrating things about being a developer, is dealing with errors… (sigh)Relying on users to report errorsDigging thru log files trying to debug issuesA million alerts flooding your inbox ruining your day...With Rollbar’s full-stack error monitoring, you get the context, insights and control you need to find and fix bugs faster. It's easy to get started tracking the errors and exceptions in your stack.You can start tracking production errors and deployments in 8 minutes - or less, and Rollbar works with all major languages and frameworks, including Ruby, Python, Javascript, PHP, Node, iOS, Android and more.You can integrate Rollbar into your existing workflow such as sending error alerts to Slack or Hipchat, or automatically create new issues in Github, JIRA, Pivotal Tracker etc. We have a special offer for Podcast.__init__ listeners. Go to rollbar.com/podcastinit, signup, and get the Bootstrap Plan free for 90 days. That's 300,000 errors tracked for free.Loved by developers at awesome companies like Heroku, Twilio, Kayak, Instacart, Zendesk, Twitch and more. Help support Podcast.__init__ and give Rollbar a try a today. Go to rollbar.com/podcastinitInterview with Lorena MesaIntroductionsHow did you get introduced to Python?How did your original interests in political science and community outreach lead to your current role as a software engineer?You dedicate a lot of your time to organizations that help teach programming to women and kids. What are some of the most meaningful experiences that you have been able to facilitate?Can you talk a bit about your work getting the PyLadies chapter in Chicago off the ground and what the reaction has been like?Now that you are a member of the board for the PSF, what are your goals in that position?What is it about software development that made you want to change your career path?What are some of the most interesting projects that you have worked on, whether for your employer or for fun?Do you think that the bootcamp you attended did a good job of preparing you for a position in industry?What is your view on the concept that software development is the modern form of

October 08, 2016 04:47 PM

October 07, 2016

Weekly Python StackOverflow Report

(xl) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2016-10-07 19:11:06 GMT

October 07, 2016 07:11 PM

Philip Semanchuk

Creating PDF Documents Using LibreOffice and Python, Part 3

This is part 3 of a 4-part series on creating PDFs using LibreOffice. You should read part 1 and part 2 if you haven’t already. This series is a supplement to a talk I gave at PyOhio 2016.

Here in part 3, I review the conversation we (the audience and I) had at the end of the PyOhio talk. I committed the speaker’s cardinal sin of not repeating (into the microphone) the questions people asked, so they’re inaudible in the video. In addition, we had some interesting conversations among multiple people that didn’t get picked up by the microphone. I don’t want them to get lost, so I summarized them here.

The most interesting thing I learned out of this conversation is that LibreOffice can open PDFs; once opened they’re like an ordinary LibreOffice document. You can edit them, save them to ODF, export to PDF, etc. Is this cool, or what?

First Question: What about Using Excel or Word?

One of the attendees jumped in to confirm that modern MS Word formats are XML-based. However, he went on to say, the XML contains a statement at the top that says something like “You cannot legally read the rest of this file”. I made a joke about not having one’s lawyer present when reading the file.

In all seriousness, I can’t find anything online that suggests that Microsoft’s XML contains a warning like that, and the few examples I looked at didn’t have have any such warning. If you can shed any light on this, please do so in the comments!

We also discussed the fact that one must invoke the office app (LibreOffice or Word, Excel, etc.) in order to render the document to PDF. LibreOffice has a reputation for performing badly when invoked repeatedly for this purpose. LibreOffice 5 may have addressed some of these problems, but as of this writing it’s still pretty new so the jury is still out on how this will work in practice.

Another attendee noted that Microsoft can save to LibreOffice format, so if Word (or Excel) is your document-editing tool of choice, you can still use LibreOffice to render it to PDF. That’s really useful if MS Office is your tool of choice but you’re doing rendering on a BSD/Linux server.

Question 2: What about Scraping PDFs?

The questioner noted that scraping a semi-complex PDF is very painful. It’d be ideal, he said, to be able to take a complex form like the 1040 and extract key value pairs of the question and answer. Is the story getting better for scraping PDFs?

My answer was that for the little experience I have with scraping PDFs, I’ve used PDFMiner, and the attendee said he was using the same.

Someone else chimed in that it’s a great use case for [Amazon’s] Mechanical Turk; in his case he was dealing with old faxes that had been scanned.

Question 3: Helper Libraries

Matt Wilson asked if it would make sense to begin building helper libraries to simplify common tasks related to manipulating LibreOffice XML. My answer was that I wasn’t sure since each project has very specific needs. Someone else suggested that one would have to start learning the spec in order to begin creating abstractions.

In the YouTube comments, Paul Hoffman1 called our attention to OdfPy a “thin abstraction over direct XML access”. It looks quite interesting.

Comment 1: Back to Scraping

One of the attendees commented that he had used Jython and PDFBox for PDF scraping. “It took a lot to get started, but once I started to figure out my way around it, it was a pretty good tool and it moved pretty speedily as compared to some of the other tools I used.” He went on to say that it was pretty complete and that it worked very well.

Question 4: About XML Parsing

The question was what I used to parse the XML, and my answer was that I used ElementTree from the standard library. Your favorite XML parsing library will work just fine.

Question 5: Protecting Bookmarks

The question was whether or not I did anything special to protect the bookmarks in the document. My answer was that I didn’t. (I’m not even sure it’s possible.) If you go through multiple rounds of editing with your client, those invisible bookmarks are inevitably going to get moved or deleted, so expect a little maintenance work related to that.

Comment 2: Weasyprint

One of the attendees commented that Weasyprint is a useful HTML/CSS to PDF converter. My observation was that tools in this class (HTML/CSS to PDF converters) are not as precise as either of the methods I outlined in this talk, but if you don’t need precision they’re probably a nice way to go.

Question 6: unoconv in a Web Server

Can one use unoconv in a Web server? My answer was that it’s possible, but it’s not practical to use it in-process. For me, it worked to do so in a demo of an intranet application, but that’s about as far as you want to go with it. It’s much more practical to use a distributed processing application (Celery, for example).

One of the attendees concurred that it makes sense to spin it off into a separate process, but “unoconv inexplicably crashes when it feels like it”.

Comment 3: Converting from Word

The initial comment was that pandoc might help with converting from Word to LibreOffice. This started a conversation which I’d summarize this way:

LibreOffice can open MS Office docs, so use that instead of pandoc and save as LibreOffice
If you open MS Office documents with LibreOffice, double check the formatting because it doesn’t always survive the transition
LibreOffice can open PDFs for editing.

October 07, 2016 06:31 PM

Mike Driscoll

An Intro to the Python Imaging Library / Pillow

The Python Imaging Library or PIL allowed you to do image processing in Python. The original author, Fredrik Lundh, wrote one of my favorite Python blogs when I first started learning Python. However PIL’s last release was way back in 2009 and the blog also stopped getting updated. Fortunately, there were some other Python folks that came along and forked PIL and called their project pillow. The pillow project is a drop-in replacement for PIL that also supports Python 3, something PIL never got around to doing.

Please note that you cannot have both PIL and pillow installed at the same time. There are some warnings in their documentation that list some differences between PIL and pillow that get updated from time to time, so I’m just going to direct you there instead of repeating them here since they will likely become out of date.

Install pillow

You can install pillow using pip or easy_install. Here’s an example using pip:

pip install Pillow

Note that if you are on Linux or Mac, you may need to run the command with sudo.

Opening Images

jelly

Pillow makes it easy to open an image file and display it. Let’s take a look:

from PIL import Image
 
image = Image.open('/path/to/photos/jelly.jpg')
image.show()

Here we just import the Image module and ask it to open our file. If you go and read the source, you will see that on Unix, the open method saves the images to a temporary PPM file and opens it with the xv utility. On my Linux machine, it opened it with ImageMagick, for example. On Windows, it will save the image as a temporary BMP and open it in something like Paint.

Getting Image Information

You can get a lot of information about an image using pillow as well. Let’s look at just a few small examples of what we can extract:

>>> from PIL import Image
>>> image = Image.open('/path/to/photos/jelly.jpg')
>>> r, g, b = image.split()
>>> histogram = image.histogram()
[384761, 489777, 557209, 405004, 220701, 154786, 55807, 35806, 21901, 16242]
>>> exif = image._getexif()
exif
{256: 1935,
 257: 3411,
 271: u'Panasonic',
 272: u'DMC-LX7',
 274: 1,
 282: (180, 1),
 283: (180, 1),
 296: 2,
 305: u'PaintShop Pro 14.00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
 306: u'2016:08:21 07:54:57',
 36867: u'2016:08:21 07:54:57',
 36868: u'2016:08:21 07:54:57',
 37121: '\x01\x02\x03\x00',
 37122: (4, 1),
 37381: (124, 128),
 37383: 5,
 37384: 0,
 37385: 16,
 37386: (47, 10),
 40960: '0100',
 40961: 1,
 40962: 3968,
 40963: 2232,
 41495: 2,
 41728: '\x03',
 41729: '\x01',
 41985: 0,
 41986: 0,
 41987: 0,
 41988: (0, 10),
 41989: 24,
 41990: 0,
 41991: 0,
 41992: 0,
 41993: 0,
 41994: 0}

In this example, we show how to extract the RGB (red, green, blue) values from the image. We also learn how to get a histrogram from the image. Note that I truncated the output a bit as the histogram’s output was much larger. You could graph the histrogram using another Python package, such as matplotlib. Finally, the example above demonstrates how to extract the EXIF information from the image. Again, I have shortened the output from this method a bit as it contained way too much information for this article.

Cropping Images

You can also crop images with pillow. It’s actually quite easy, although it may take you a little trial and error to figure it out. Let’s try cropping our jellyfish photo:

>>> from PIL import Image
>>> image = Image.open('/path/to/photos/jelly.jpg')
>>> cropped = image.crop((0, 80, 200, 400))
>>> cropped.save('/path/to/photos/cropped_jelly.png')

You will note that all we need to do is open the image and then call its crop method. You will need to pass in the x/y coordinates that you want to crop too, i.e. (x1, y1, x2, y2). In pillow, the 0 pixel is the top left pixel. As you increase your x value, it goes to the right. As you increase the y value, you go down the image. When you run the code above, you’ll end up with the following image:

cropped_jelly

That’s a pretty boring crop. I want to crop the jellyfish’s “head”. To get the right coordinates quickly, I used Gimp to help me figure out what coordinates to use for my next crop.

>>> from PIL import Image
>>> image = Image.open('/path/to/photos/jelly.jpg')
>>> cropped = image.crop((177, 882, 1179, 1707))
>>> cropped.save('/path/to/photos/cropped_jelly2.png')

If we run this code, we’ll end up with the following cropped version:

cropped_jelly2

That’s much better!

Using Filters

Original Jellyfish

There are a variety of filters that you can use in pillow to apply to your images. They are contained in the ImageFilter module. Let’s look at a couple of them here:

>>> from PIL import ImageFilter
>>> from PIL import Image
>>> image = Image.open('/path/to/photos/jelly.jpg')
>>> blurred_jelly = image.filter(ImageFilter.BLUR)
>>> blurred_jelly.save('/path/to/photos/blurry_jelly.png')

This will blur the jellyfish photo slightly. Here’s the result I got:

Blurry Jellyfish

Of course, most people like their images sharper rather than blurrier. Pillow has your back. Here’s one way to sharpen the image:

>>> from PIL import ImageFilter
>>> from PIL import Image
>>> image = Image.open('/path/to/photos/jelly.jpg')
>>> blurred_jelly = image.filter(ImageFilter.SHARPEN)
>>> blurred_jelly.save('/path/to/photos/sharper_jelly.png')

When you run this code, you’ll end up with the following:

Sharper Jellyfish

You can also use the ImageEnhance module for sharpening your photos, among other things.

There are other filters that you can apply too, such as DETAIL, EDGE_ENHANCE, EMBOSS, SMOOTH, etc. You can also write your code in such a way that you can apply multiple filters to your image.

You will likely need to download the images above to really be able to compare the differences in the filters.

Wrapping Up

You can do much more with the pillow package than what is covered in this short article. Pillow supports image transforms, processing bands, image enhancements, the ability to print your images and much more. I highly recommend reading the pillow documentation to get a good grasp of everything you can do.

Talk Python to Me

#79 Beeware Python Tools

Could you write me a Python app for the wide range of platforms out there? Oh, wait, I want them to be native GUI applications. And I need them on mobile (Android, iOS, tvOS, and watchOS) as well as major desktop apps. I also need them to appear indistinguishable from native apps (be a .app on macOS, .exe on Windows, etc). What technology would you use for this? This week I'll introduce you to a wide set of small, focused and powerful tools that make all of this, and more, possible. We're speaking with Russell Keith-Magee, founder of the Beeware project. Links from the show: <div style="font-size: .85em;"> Beeware Project: <a href='http://pybee.org/' target='_blank'>pybee.org</a> Russell Keith-Magee: <a href='https://twitter.com/freakboy3742' target='_blank'>@freakboy3742</a> Beeware on GitHub: <a href='https://github.com/pybee' target='_blank'>github.com/pybee</a> Organizing areas of the project Tools: <a href='http://pybee.org/project/projects/tools/' target='_blank'>pybee.org/project/projects/tools</a> Applications: <a href='http://pybee.org/project/projects/applications/' target='_blank'>pybee.org/project/projects/applications</a> Libraries: <a href='http://pybee.org/project/projects/libraries/' target='_blank'>pybee.org/project/projects/libraries</a> Bridges: <a href='http://pybee.org/project/projects/bridges/' target='_blank'>pybee.org/project/projects/bridges</a> Templates: <a href='http://pybee.org/project/projects/templates/' target='_blank'>pybee.org/project/projects/templates</a> Support: <a href='http://pybee.org/project/projects/support/' target='_blank'>pybee.org/project/projects/support</a> </div>

October 07, 2016 08:00 AM

October 06, 2016

Robin Parmar

How to retrieve photo metadata in Python

You'd think it would be easy to retrieve and even edit photo metadata. After all, we are living in the 21st century. But, no, some things prove more difficult than they should. A search for applications turned up quite a few that would display the metadata but none that would easily edit it.

OK, so there's always the programmatic approach. And for that I turn to Python. Let's see what the state of the art holds for us. (Hint: It's a bumpy ride.)

First, a couple of constraints. I develop on a Windows 10 machine, largely because that's the same computer all my other goodies are on. Yes, LINUX might be better for development, but not for desktop use. (Cue old debate.)

Second, because I am indeed living in this century, I prefer to use Python 3, the version that broke backwards compatibility. It's been around for 8 years, so is not exactly new.

What is metadata?
Metadata is simply a list of strings stored in an image file. These strings are carried along with the image, and can identify the author, camera characteristics, copyright information, and so on. There are two main metadata standards.

Exif, the Exchangeable image file format, has been around since 1995. It works with media such as WAV sound files, TIFF images, and JPG.

IPTC, defined by the International Press Telecommunications Council, is designed to standardise data for news gathering and journalism. There are two main parts, IPTC Core and IPTC Extension.

The remainder of this article will investigate methods of reading this data, in Python.

Take a PIL
When we think of images and Python we think first of the Python Imaging Library (PIL). Or, rather, it's more current fork, named Pillow.

You can install this useful library using the simple mechanism of typing at the command line:

pip install Pillow

This works across platforms.

In fact, if pip fails, I usually give up right away. Not because there aren't other ways to install. But if pip fails, it is a good indication that the library is not well maintained. As we shall see.

In any case, here's my test code. It relies on the fact I have defined a path to a good test file.

fn = 'path/to/some/file/tester.jpg'

def test_PIL():
    # test PIL
    from PIL import Image
    from PIL.ExifTags import TAGS
    print( '\n Test of PIL >> \n' )

    img = Image.open(fn)
    info = img._getexif()
    for k, v in info.items():
        nice = TAGS.get(k, k)
        print( '%s (%s) = %s' % (nice, k, v) )

Interrogating the image for Exif information returns a dictionary. We can iterate over this to see all the meta-tags. In this case a useful TAGS dictionary converts the numeric keys to English equivalents. So, instead of wondering what tag 315 means, we know that it is "artist".

Unfortunately, with my test data I noticed problems. (My programme output is at the bottom of this post, for convenience.) First, the "copyright" field contained scrambled text. Second, the "comment" field did not show up at all. This could perhaps be because Pillow reports only Exif and not IPTC. In any case, it is insufficient and unreliable.

Some dead ends

At this point I did a web search and came up with several likely candidates. But they soon proved frustrating.

The library pyexiv2 is deprecated in favour of GExiv2, part of the Gnome set and hence without a Windows installer nor any way to easily compile.

IPTCInfo is recommended in certain blog articles, like this one, already out-dated, though only four years old.

The automatic install for IPTCInfo failed, so I checked and discovered that the last code update was back in 2011. As a single module, it was easy enough to install manually. But then I discovered that it was not at all Python 3 compatible. My attempts to change the code manually ended in failure.

A piece of the Piexif

Piexif has been tested across platforms and has no dependencies. The documentation is a bit terse, but helpfully indicates that the main "load" function returns several dictionaries, plus a byte dump that forms a thumbnail. I wrote my code to avoid this.

def test_piexif():
    # test Piexif
    import piexif
    print( '\n Test of Piexif >>' )

    data = piexif.load(fn)
    for key in ['Exif', '0th', '1st', 'GPS', 'Interop']:
        subdata = data[key]
        print( '\n%s:' % key )
        for k, v in subdata.items():
            print( '%s = %s' % (k, v) )

I really don't know what "0th" and "1st" mean as dictionary names, but it does appear that I get out all of the meta tags I expect. In particular, the tag marked 37510 contains my comment.

Like PIL, this library has a dictionary to map the obscure codes to names. I thought I should interrogate this.

def test_piexif_inspect():
    # display all metadata names
    import piexif
    print( '\n Inspect piexif >>\n' )

    info = piexif.ImageIFD.__dict__
    l = ['%s = %s' % (v, k) for k, v in info.items()]
    l.sort()
    for item in l:
        print(item)

The result is missing a mapping for tag 37510, the very one I want to use!

OK, not such a big deal in this case. But what if I start using other tags and have to decipher the codes manually? Rather annoying.

You will also notice an odd encoding problem. Rather than contain my comment as is, the tag reads...

b'ASCII\x00\x00\x00MY TEST COMMENT!'

The b marks the string as binary, which is some odd Python 2 designation. The smart thing to do is decode this to a proper code page, but then we have the prefix cruft.

The following will do the trick, but I am again disliking the arbitrary nature of this decoding.

def test_piexif_use():
    import piexif
    print( '\n Usage of piexif >>' )
    data = piexif.load(fn)
    exif = data['Exif']
    comment = exif.get(37510, '').decode('UTF-8')
    comment = comment[8:]
    print( comment )

Try exifread

Finally, I stumbled upon the library exifread.

Here again is my test script. As before, I skip past some tags that are going to be long boring byte strings. And I progress in sorted order, just for convenience.

def test_exifread():
    import exifread
    print( '\n Test of exifread >>\n' )

    with open(fn, 'rb') as f:
        exif = exifread.process_file(f)

    for k in sorted(exif.keys()):
        if k not in ['JPEGThumbnail', 'TIFFThumbnail', 'Filename', 'EXIF MakerNote']:
            print( '%s = %s' % (k, exif[k]) )

The result? All of the tags I expect are present, in human-readable encoding. It seems that this obscure project is the winner. Some of the more popular libraries need to do some catching up!

Though, one big limitation exists even here. This library does not support editing the tags. For that, you will need to use one of the previous choices and work around the cruft.

Nonetheless, I hope this article saves you the time I unfortunately spent.

Output

Here follows my test output, for reference:

 Test of PIL >>

ExifVersion (36864) = b'0230'
ShutterSpeedValue (37377) = (9965784, 1000000)
ExifImageWidth (40962) = 600
DateTimeOriginal (36867) = 2011:06:09 01:20:59
DateTimeDigitized (36868) = 2011:06:09 01:20:59
MaxApertureValue (37381) = (0, 256)
SceneCaptureType (41990) = 0
MeteringMode (37383) = 5
LightSource (37384) = 0
Flash (37385) = 24
FocalLength (37386) = (77, 1)
CFAPattern (41730) = b'\x02\x00\x02\x00\x00\x01\x01\x02'
Make (271) = OLYMPUS IMAGING CORP.
Model (272) = E-P1
Orientation (274) = 1
ExifImageHeight (40963) = 600
Contrast (41992) = 0
Copyright (33432) = Robin Parmar  mar
ExposureBiasValue (37380) = (-3, 10)
XResolution (282) = (720000, 10000)
YResolution (283) = (720000, 10000)
ExposureTime (33434) = (1, 1000)
DigitalZoomRatio (41988) = (100, 100)
FocalLengthIn35mmFilm (41989) = 116
ExposureProgram (34850) = 3
ColorSpace (40961) = 65535
BodySerialNumber (42033) = H52502123
ResolutionUnit (296) = 2
WhiteBalance (41987) = 0
GainControl (41991) = 1
Software (305) = Adobe Photoshop CS5 Windows
DateTime (306) = 2011:08:22 21:39:05
LensMake (42035) = Pentax
LensModel (42036) = smc Pentax F A77 Limited
Saturation (41993) = 0
Artist (315) = Robin Parmar
Sharpness (41994) = 0
FileSource (41728) = b'\x03'
CustomRendered (41985) = 0
ExposureMode (41986) = 1
ExifOffset (34665) = 268
ISOSpeedRatings (34855) = 200

 Test of Piexif >>

Exif:
36864 = b'0230'
37377 = (9965784, 1000000)
40962 = 600
36867 = b'2011:06:09 01:20:59'
36868 = b'2011:06:09 01:20:59'
37381 = (0, 256)
37510 = b'ASCII\x00\x00\x00MY TEST COMMENT!'
37383 = 5
37384 = 0
37385 = 24
37386 = (77, 1)
41988 = (100, 100)
41986 = 1
40963 = 600
37380 = (-3, 10)
41730 = b'\x02\x00\x02\x00\x00\x01\x01\x02'
33434 = (1, 1000)
41728 = b'\x03'
41989 = 116
34850 = 3
42033 = b'H52502123'
40961 = 65535
41990 = 0
34855 = 200
41987 = 0
41991 = 1
41992 = 0
42035 = b'Pentax'
42036 = b'smc Pentax F A77 Limited'
41993 = 0
41994 = 0
41985 = 0

0th:
283 = (720000, 10000)
296 = 2
34665 = 11444
306 = b'2011:08:22 21:39:05'
270 = b''
271 = b'OLYMPUS IMAGING CORP.'
272 = b'E-P1'
305 = b'Adobe Photoshop CS5 Windows'
274 = 1
33432 = b'Robin Parmar'
282 = (720000, 10000)
315 = b'Robin Parmar'

1st:
513 = 878
514 = 10416
259 = 6
296 = 2
282 = (72, 1)
283 = (72, 1)

GPS:

Interop:

 Test of exifread >>

EXIF BodySerialNumber = H52502123
EXIF CVAPattern = [2, 0, 2, 0, 0, 1, 1, 2]
EXIF ColorSpace = Uncalibrated
EXIF Contrast = Normal
EXIF CustomRendered = Normal
EXIF DateTimeDigitized = 2011:06:09 01:20:59
EXIF DateTimeOriginal = 2011:06:09 01:20:59
EXIF DigitalZoomRatio = 1
EXIF ExifImageLength = 600
EXIF ExifImageWidth = 600
EXIF ExifVersion = 0230
EXIF ExposureBiasValue = -3/10
EXIF ExposureMode = Manual Exposure
EXIF ExposureProgram = Aperture Priority
EXIF ExposureTime = 1/1000
EXIF FileSource = Digital Camera
EXIF Flash = Flash did not fire, auto mode
EXIF FocalLength = 77
EXIF FocalLengthIn35mmFilm = 116
EXIF GainControl = Low gain up
EXIF ISOSpeedRatings = 200
EXIF LensMake = Pentax
EXIF LensModel = smc Pentax F A77 Limited
EXIF LightSource = Unknown
EXIF MaxApertureValue = 0
EXIF MeteringMode = Pattern
EXIF Saturation = Normal
EXIF SceneCaptureType = Standard
EXIF Sharpness = Normal
EXIF ShutterSpeedValue = 1245723/125000
EXIF UserComment = MY TEST COMMENT!
EXIF WhiteBalance = Auto
Image Artist = Robin Parmar
Image Copyright = Robin Parmar
Image DateTime = 2011:08:22 21:39:05
Image ExifOffset = 11444
Image ImageDescription =
Image Make = OLYMPUS IMAGING CORP.
Image Model = E-P1
Image Orientation = Horizontal (normal)
Image ResolutionUnit = Pixels/Inch
Image Software = Adobe Photoshop CS5 Windows
Image XResolution = 72
Image YResolution = 72
Thumbnail Compression = JPEG (old-style)
Thumbnail JPEGInterchangeFormat = 878
Thumbnail JPEGInterchangeFormatLength = 10416
Thumbnail ResolutionUnit = Pixels/Inch
Thumbnail XResolution = 72
Thumbnail YResolution = 72

October 06, 2016 09:29 PM

François Dion

Improving your communications: Professional Audio-Video Production on Linux

Pro AV on Linux

I'll be presenting on the subject of Professional Audio-Video Production on Linux, next week at TriLug.

From concept to finished product, it has never been easier to obtain professional results when it comes to audio-video production on Linux.

We will cover some of the hardware that should be part of your production suite, from microphones to jog wheels and highlight some of the top tools for animation, audio, broadcasting, effects, modeling, music, transcoding and video. We will also go beyond the usual suspects and introduce some tools that might not be typically used for AV production.

By the end of the presentation, you will have all the tools you need to improve the quality of your communications, for your personal enjoyment, your career, or your business.

When

Thursday, 13 October 2016 - 7:00pm to 9:00pm

Where

The Frontier, 800 Park Offices Drive, Durham, NC

More info at the Trilug website.

Francois Dion
@f_dion

October 06, 2016 05:35 PM

Machinalis

First release of mypy-django

We’re happy to make a release of mypy-django 0.1.1, the first of many!

It’s a collection of type stubs for using with the mypy static type checking tool (and also with other PEP-484 compliant tools). If you’re a Django developer who wants to improve the quality of documentation and checking of your code, you might be interested. You can take a look at the README file for some examples, or at the annotated version of the Django tutorial for a full project.

Our goal is to be able to annotate (and validate annotations) in our Django projects, and have yet another tool in our software quality arsenal at Machinalis... But given that we’re releasing it under a BSD license, it’s also a tool for everyone else. Feel free to let us know if you’re using it or if we can help you integrate it with your projects.

Support is mainly for Django 1.10, but older versions should work reasonably well.

Supported Components

HttpRequest and HttpResponse objects
- Including supporting classes like QueryDict and file objects
Generic views
URL resolver
Other miscellaneous components required by the above (timezones, cookies, ...)

Github: https://github.com/machinalis/mypy-django
Documentation: Check the included README file
Twitter: @machinalis, @dmoisset

October 06, 2016 02:26 PM

Richard Gomes

Strong type checking in Python

This article describes a Python annotation which combines documentation with type checking in order to help Python developers to gain better understanding and control of the code, whilst allowing them to catch mistakes on the spot, as soon as they occur.

Being a Java developer previously but extradited to Python by my own choice, I sometimes feel some nostalgy from the old times, when the Java compiler used to tell me all sorts of stupidities I used to do.

In the Python world no one is stupid obviously, except probably me who many times find myself passing wrong types of arguments by accident or by pure stupidity, in case you accept the hypothesis that there's any difference between the two situations.

When you are coding your own stuff, chances are that you know very well what is going on. In general, you have the entire bloody API alive and kicking inside your head. But when you are learning some third party software, in particular large frameworks, chances are that your code is called by something you don't understand very well, which decides to pass arguments to your code which you do not have a clue what they are about.

Documentation

Documentation is a good way of sorting out this difficulty. Up-to-date documentation, in particular, is the sort of thing I feel extremely happy when I have chance to find one. My mood is being constantly crunched these days, if you understand what I mean.

Outdated documentation is not only useless but also undesirable. Possibly for this reason some (or many?) people prefer no documentation at all, since absence of information is better than misinformation, they defend.

It's very difficult to keep documentation up-to-date, unless you are forced somehow to do so. Maybe at gun point?

Strong type checking

I'm not in the quest of convincing anyone that strong type checking is good or useful or desirable. Like everything in life, there are pros and cons.

On the other hand, I'd like to present a couple of benefits which keep strong type checking in my wishlist:

* I'd like to have the ability to stop the application as soon as a wrong type is received by a function or returned by a function to its caller. Stop early, catch mistakes easily, immediately, on spot.

* I'd like to identify and document argument types being passed by frameworks to my code, easily, quickly, effectively, without having to turn the Internet upside down every time I'm interested to learn what argument x is about.

Introducing sphinx_typesafe

Doing a bit of research, I found an interesting library called IcanHasTypeCheck (or ICHTC for short), which I ended up rewriting almost from scratch during the last revision and I've renamed it to sphinx_typesafe.

Let me explain the idea:

In the docstring of a function or method, you employ Sphinx-style documentation patterns in order to tell types associated to variables.

If your documentation is pristine, the number of arguments in the documentation match the number of arguments in the function or method definition.

If your logic is pristine, the types of arguments you documented match the types of arguments actually passed to the function or method at runtime, or returned by the function or method to the caller, at runtime.

You just need to add an annotation @typesafe before the function or method, and sphinx_typesafe checks if the documentation matches the definition.

If you don't have a clue about the type of an argument, simply guess some unlikely type, say: None. Then run the application and sphinx_typesafe will interrupt the execution of it and report that the actual type does not match None. The next step is obviously substitute None by the actual type.

Benefits

A small example tells more than several paragraphs.
Imagine that you see some code like this:

    import math

    def d(p1, p2):
        x = p1.x - p2.x
        y = p1.y - p2.y
        return math.sqrt(x*x + y*y)

Imagine that you had type information about it, like this:

    import math
    from sphinx_typesafe import typesafe

    @typesafe
    def d(p1, p2):
        """
        :type p1: shapes.Point
        :type p2: shapes.Point
        :rtype : float
        """
        x = p1.x - p2.x
        y = p1.y - p2.y
        return math.sqrt(x*x + y*y)

Now you are able to understand what this code is about, quickly!.
In particular, you are able to tell what it is the domain of types this code is intended to operate on.

When you run this code, if this function receives a shapes.Square instead of a shape.Point, it would stop immediately. Notice that, eventually, a shape.Square may have components x and y which would make the function return wrong results silently. Imagine your test cases catching this situation!

So, I hope I demonstrated the two benefits I was interested on.

Missing Features

Polymorphism

Sometimes I would like to tell that an argument can be a file but also a str. At the moment I can say that the argument can be types.NotImplementedType meaning "any type". But I would like something more precise, like this:

    :type f: [file, str]

This is not difficult to implement, actually, but we are not there yet.

Non intrusive

I would like to have a non intrusive way to turn on type checking and a very cheap way of turning off type checking, if possible without any code change.

Thinking more about use cases, I guess that type checking is very useful when you are developing and, in particular, when you are running your test suite. You are probably not interested on having the overhead of type checking on production code which was theoretically exhaustively tested.

Long story short, I would like to integrate sphinx_typesafe with pytest, so that an automatic decoration of functions and methods would happen automagically and without any code change.

If pytest finds a docstring which happens to contain a Sphinx-style type specification on it, @typesafe is applied to the function or method. That would be really nice! You could also run your code in production without type checking since type checking was never turned on in the first place.

The idea looks to be great, but my ignorance on pytest internals and my limited time prevents me of going ahead. Maybe in future!

Python3 support

The sources of sphinx_typesafe itself are ready for Python3, but sphinx_typesafe does not handle properly your sources written in Python3 yet. It's not difficult to implement, actually: it's just a matter of adjusting one function, but we are not there yet. Maybe you feel compelled to contribute?

More Information

https://pypi.python.org/pypi/sphinx_typesafe

Credits

Thank Klaas for inspiration and his IcanHasTypeCheck (or ICHTC for short).

October 06, 2016 02:10 PM

Import Python

ImportPython Issue 93

Worthy Read

Setting up Sublime Text for Python Developers

sublime

We have been sharing Daniel's articles and videos from this youtube channel https://www.youtube.com/channel/UCI0vQvr9aFn27yR6Ej6n5UA for a while now. Daniel Bader just published his book on Sublime Text for Python Developers. Have a look at his book if you are a sublime text user. Here is a 30% discount for all ImportPython Subscribers.

Implementing the "Soft Delete" Pattern with Flask and SQLAlchemy

flask
, SQLAlchemy

You can find lots of reasons to never delete records from your database. The Soft Delete pattern is one of the available options to implement deletions without actually deleting the data. It does it by adding an extra column to your database table(s) that keeps track of the deleted state of each of its rows. This sounds straightforward to implement, and strictly speaking it is, but the complications that derive from the use of soft deletes are far from trivial. In this article I will discuss some of these issues and how I avoid them in Flask and SQLAlchemy based applications.

A Dramatic Tour through Python’s Data Visualization Landscape (including ggplot and Altair)

data visualization

Comprehensive listing of all data visualization packages with small codesnippets.

Database concurrency in Django the right way

django

Guilherme Caminha explores the utility of using on_commit hook available from 1.9 onwards in sequencing part of a time consuming task in django view and rest offloaded to an async process.

Python has come a long way. So has job hunting.

Try Hired and get in front of 4,000+ companies with one application. No more pushy recruiters, no more dead end applications and mismatched companies, Hired puts the power in your hands.

Sponsor

Thinking in coroutines

async-io

Lukasz Langa uses asyncio source code to explain the event loop, blocking calls, coroutines, tasks, futures, thread pool executors, and process pool executors.

Automate generation of man pages for python click applications

opensource project

Click is my go to Python package for creating command line applications. click-man will generate one man page per command of your click CLI application specified in console_scripts in your setup.py.

PyDev of the Week: Bryan Van de Ven

interview

Bryan is a core developer of the Bokeh project, which is a visualization package for Python. He has also helped with the development of Anaconda.

Flashlight is a lightweight Python library for analyzing and solving quadrotor control problems.

Flashlight enables you to easily solve for minimum snap trajectories that go through a sequence of waypoints, compute the required control forces along trajectories, execute the trajectories in a physics simulator, and visualize the simulation results.

Church

opensource project

Church is a library to generate fake data. It's very useful when you need to bootstrap your database.

5 music things and Python

Raspberry and Python projects/scripts.

validr

A simple,fast,extensible python library for data validation.

Upcoming Conference / User Group Meet

Santa Cruz Python Meetup

Projects

tf-agent - 27 Stars, 1 Fork

tensorflow reinforcement learning agents for OpenAI gym environments

become - 5 Stars, 0 Fork

Make one object become another.

python-line-api - 4 Stars, 0 Fork

SDK of the LINE Messaging API for Python.

football-stats - 2 Stars, 0 Fork

Football stats is a system which has the purpose of helping football match analyses. The final goal of the project is to have the capability of ball and players' position analysis, creating heatmaps and statistics of different actions or situations.

pytocli - 2 Stars, 0 Fork

A Python lib to generate CLI commands

xfce4-system-monitor - 1 Stars, 0 Fork

An xfce panel plugin to display the necessary information of the system.

October 06, 2016 01:39 PM

A. Jesse Jiryu Davis

Computer Science For All

MongoDB offered a paid fellowship to two teachers this summer. Jeremy Mellema and Tim Chen worked with the MongoDB Education Team in our office, developing a computer science curriculum based on Python, MongoDB, and other technologies. This fall, they're starting to teach the new class in NYC public high schools in Hell's Kitchen and the Bronx.

I followed them for a day, talking to them and their students and taking pictures. Read the story on the MongoDB Engineering Journal:

Investing In CS4All: Training Teachers and Helping Them Build Curricula

Images © A. Jesse Jiryu Davis

October 06, 2016 09:40 AM

Abu Ashraf Masnun

Async Python: The Different Forms of Concurrency

With the advent of Python 3 the way we’re hearing a lot of buzz about “async” and “concurrency”, one might simply assume that Python recently introduced these concepts/capabilities. But that would be quite far from the truth. We have had async and concurrent operations for quite some times now. Also many beginners may think that asyncio is the only/best way to do async/concurrent operations. In this post we shall explore the different ways we can achieve concurrency and the benefits/drawbacks of them.

Defining The Terms

Before we dive into the technical aspects, it is essential to have some basic understanding of the terms frequently used in this context.

Sync vs Async

In Syncrhonous operations, the tasks are executed in sync, one after one. In asynchronous operations, tasks may start and complete independent of each other. One async task may start and continue running while the execution moves on to a new task. Async tasks don’t block (make the execution wait for it’s completion) operations and usually run in the background.

For example, you have to call a travel agency to book for your next vacation. And you need to send an email to your boss before you go on the tour. In synchronous fashion, you would first call the travel agency, if they put you on hold for a moment, you keep waiting and waiting. Once it’s done, you start writing the email to your boss. Here you complete one task after another. But if you be clever and while you are waiting on hold, you could start writing up the email, when they talk to you, you pause writing the email, talk to them and then resume the email writing. You could also ask a friend to make the call while you finish that email. This is asynchronicity. Tasks don’t block one another.

Concurrency and Parallelism

Concurrency implies that two tasks make progress together. In our previous example, when we considered the async example, we were making progress on both the call with the travel agent and writing the email. This is concurrency.

When we talked about taking help from a friend with the call, in that case both tasks would be running in parallel.

Parallelism is in fact a form of concurrency. But parallelism is hardware dependent. For example if there’s only one core in the CPU, two operations can’t really run in parallel. They just share time slices from the same core. This is concurrency but not parallelism. But when we have multiple cores, we can actually run two or more operations (depending on the number of cores) in parallel.

Quick Recap

So this is what we have realized so far:

Sync: Blocking operations.
Async: Non blocking operations.
Concurrency: Making progress together.
Parallelism: Making progress in parallel.

Parallelism implies Concurrency. But Concurrency doesn’t always mean Parallelism.

Threads & Processes

Python has had Threads for a very long time. Threads allow us to run our operations concurrently. But there was/is a problem with the Global Interpreter Lock (GIL) for which the threading could not provide true parallelism. However, with multiprocessing, it is now possible to leverage multiple cores with Python.

Threads

Let’s see a quick example. In the following code, the worker function will be run on multiple threads, asynchronously and concurrently.

import threading
import time
import random


def worker(number):
    sleep = random.randrange(1, 10)
    time.sleep(sleep)
    print("I am Worker {}, I slept for {} seconds".format(number, sleep))


for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    t.start()

print("All Threads are queued, let's see when they finish!")

Here’s a sample output from a run on my machine:

$ python thread_test.py
All Threads are queued, let's see when they finish!
I am Worker 1, I slept for 1 seconds
I am Worker 3, I slept for 4 seconds
I am Worker 4, I slept for 5 seconds
I am Worker 2, I slept for 7 seconds
I am Worker 0, I slept for 9 seconds

So you can see we start 5 threads, they make progress together and when we start the threads (and thus executing the worker function), the operation does not wait for the threads to complete before moving on to the next print statement. So this is an async operation.

In our example, we passed a function to the Thread constructor. But if we wanted we could also subclass it and implement the code as a method (in a more OOP way).

Further Reading:

To know about Threads in details, you can follow these resources:

https://pymotw.com/3/threading/index.html

Global Interpreter Lock (GIL)

The Global Interpreter Lock aka GIL was introduced to make CPython’s memory handling easier and to allow better integrations with C (for example the extensions). The GIL is a locking mechanism that the Python interpreter runs only one thread at a time. That is only one thread can execute Python byte code at any given time. This GIL makes sure that multiple threads DO NOT run in parallel.

Quick facts about the GIL:

One thread can run at a time.
The Python Interpreter switches between threads to allow concurrency.
The GIL is only applicable to CPython (the defacto implementation). Other implementations like Jython, IronPython don’t have GIL.
GIL makes single threaded programs fast.
For I/O bound operations, GIL usually doesn’t harm much.
GIL makes it easy to integrate non thread safe C libraries, thansk to the GIL, we have many high performance extensions/modules written in C.
For CPU bound tasks, the interpreter checks between N ticks and switches threads. So one thread does not block others.

Many people see the GIL as a weakness. I see it as a blessing since it has made libraries like NumPy, SciPy possible which have taken Python an unique position in the scientific communities.

Further Reading:

These resources can help dive deeper into the GIL:

http://www.dabeaz.com/python/UnderstandingGIL.pdf

Processes

To get parallelism, Python introduced the multiprocessing module which provides APIs which will feel very similar if you have used Threading before.

In fact, we will just go and change our previous example. Here’s the modified version that uses Process instead of Thread.


import multiprocessing
import time
import random


def worker(number):
    sleep = random.randrange(1, 10)
    time.sleep(sleep)
    print("I am Worker {}, I slept for {} seconds".format(number, sleep))


for i in range(5):
    t = multiprocessing.Process(target=worker, args=(i,))
    t.start()

print("All Processes are queued, let's see when they finish!")

So what’s changed? I just imported the multiprocessing module instead of threading. And then instead of Thread, I used Process. That’s it, really! Now instead of multi threading, we are using multiple processes which are running on different core of your CPU (assuming you have multiple cores).

With the Pool class, we can also distribute one function execution across multiple processes for different input values. If we take the example from the official docs:

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(f, [1, 2, 3]))

Here, instead of iterating over the list of values and calling f on them one by one, we are actually running the function on different processes. One process executes f(1), another runs f(2) and another runs f(3). Finally the results are again aggregated in a list. This would allow us to break down heavy computations into smaller parts and run them in parallel for faster calculation.

Further Reading:

https://pymotw.com/3/multiprocessing/index.html

The `concurrent.futures` module

The concurrent.futures module packs some really great stuff for writing async codes easily. My favorites are the ThreadPoolExecutor and the ProcessPoolExecutor. These executors maintain a pool of threads or processes. We submit our tasks to the pool and it runs the tasks in available thread/process. A Future object is returned which we can use to query and get the result when the task has completed.

Here’s an example of ThreadPoolExecutor:

from concurrent.futures import ThreadPoolExecutor
from time import sleep
 
def return_after_5_secs(message):
    sleep(5)
    return message
 
pool = ThreadPoolExecutor(3)
 
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print(future.result())

I have a blog post on the concurrent.futures module here: http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html which might be helpful for exploring the module deeper.

Further Reading:

https://pymotw.com/3/concurrent.futures/

Asyncio - Why, What and How?

You probably have the question many people in the Python community have - What does asyncio bring new to the table? Why did we need one more way to do async I/O? Did we not have threads and processes already? Let’s see!

Why do we need asyncio?

Processes are costly to spawn. So for I/O, Threads are chosen largely. We know that I/O depends on external stuff - slow disks or nasty network lags make I/O often unpredictable. Now, let’s assume that we are using threads for I/O bound operations. 3 threads are doing different I/O tasks. The interpreter would need to switch between the concurrent threads and give each of them some time in turns. Let’s call the threads - T1, T2 and T3. The three threads have started their I/O operation. T3 completes it first. T2 and T1 are still waiting for I/O. The Python interpreter switches to T1 but it’s still waiting. Fine, so it moves to T2, it’s still waiting and then it moves to T3 which is ready and executes the code. Do you see the problem here?

T3 was ready but the interpreter switched between T2 and T1 first - that incurred switching costs which we could have avoided if the interpreter first moved to T3, right?

What is asyncio?

Asyncio provides us an event loop along with other good stuff. The event loop tracks different I/O events and switches to tasks which are ready and pauses the ones which are waiting on I/O. Thus we don’t waste time on tasks which are not ready to run right now.

The idea is very simple. There’s an event loop. And we have functions that run async, I/O operations. We give our functions to the event loop and ask it to run those for us. The event loop gives us back a Future object, it’s like a promise that we will get something back in the future. We hold on to the promise, time to time check if it has a value (when we feel impatient) and finally when the future has a value, we use it in some other operations.

Asyncio uses generators and coroutines to pause and resume tasks. You can read these posts for more details:

How do we use asyncio?

Before we beging, let’s see example codes:

import asyncio
import datetime
import random


async def my_sleep_func():
    await asyncio.sleep(random.randint(0, 5))


async def display_date(num, loop):
    end_time = loop.time() + 50.0
    while True:
        print("Loop: {} Time: {}".format(num, datetime.datetime.now()))
        if (loop.time() + 1.0) >= end_time:
            break
        await my_sleep_func()


loop = asyncio.get_event_loop()

asyncio.ensure_future(display_date(1, loop))
asyncio.ensure_future(display_date(2, loop))

loop.run_forever()

Please note that the async/await syntax is Python 3.5+ only. if we walk through the codes:

We have an async function display_date which takes a number (as an identifier) and the event loop as parameters.
The function has an infinite loop that breaks after 50 secs. But during this 50 sec period, it repeatedly prints out the time and takes a nap. The await function can wait on other async functions (coroutines) to complete.
We pass the function to event loop (using the ensure_future method).
We start running the event loop.

Whenever the await call is made, asyncio understands that the function is probably going to need some time. So it pauses the execution, starts monitoring any I/O event related to it and allows tasks to run. When asyncio notices that paused function’s I/O is ready, it resumes the function.

Making the Right Choice

We have walked through the most popular forms of concurrency. But the question remains - when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
    else:
       print("Use Threads")
else:
    print("Multi Processing")

CPU Bound => Multi Processing
I/O Bound, Fast I/O => Multi Threading
I/O Bound, Slow I/O => Asyncio

October 06, 2016 06:10 AM

Eric Holscher

Semantic Meaning in Authoring Documentation

Semantic Meaning in documentation is the separation of what something is from what it looks like. What we mean and what we display are very different things.

We might want to warn someone in our writing, but we don’t want to think about how to display this. Writing with Semantic Meaning gives us this power.

Semantics

As an example, if we were writing documentation in HTML, we could warn a user with bold:

<b>
    Don't do this, it will break your system!
</b>

This has no semantics. Bold doesn’t mean “warning”, it is simply a way of displaying text. A better example in HTML is:

<span class="warning">
    Don't do this, it will break your system!
</span>

This allows you to write what something is, but not have to worry about what it looks like. Your designer might decide that warnings should be bold and red:

.warning {
    text-format: bold;
    color: red;
}

The important part is that you don’t need to think about what warnings look like, just that it’s a warning.

To go one step further, in reStructuredText we can do:

.. warning:: Don't do this, it will break your system!

Then Sphinx or some other tool will generate the proper HTML or PDF with styles for us. reStructuredText is abstracted away from all of the output formats, so you write in one format, and it transforms it properly into HTML, PDF, XML, or any other format it supports.

This approach allows your designers to work with a systematic and standardized set of class names, generated from the tooling. This keeps all your styles the same, and allows the tool to warn you if you try and represent something that doesn’t exist.

Value of Semantic Documentation

When you write documentation, you form a mental model in your head of the document you’re writing. When you use a powerful tool like reStructuredText, you can think in terms of warnings, references, classes, objects, and other powerful semantic constructs.

You start thinking in terms of nouns that can be represented in your problem domain. You can then encode this model into your document:

.. warning:: Make sure you :term:`instantiate`
             the Response objects before you use it.

You can read more about the :cls:`django.http.Response`
in our :doc:`/api/response` page.

In the above section, I thought Hey, maybe someone doesn’t know what instantiate means. I was able to link to the glossary with :term:. I didn’t have to look up the URL for our glossary and link to that. I didn’t have to think about how to style glossary references. I was able to simply write what I meant, and move on.

When you write with semantics, you can encode more of your mental state into the words you write. Conversely, if you write without semantics, valuable information about your writing is lost.

Semantic information also acts as a type of documentation for our writing. Similar to type systems in programming, they allow you to be explicit about what you’re talking about. When you write documentation about a Response object, it isn’t immediately obvious what that is. When you write about a :cls:`django.http.Response`, it is explicitly defined what you’re talking about.

Note

When you write documentation in Markdown, there is no clear way to represent semantic information. You can make something bold, but you can’t make something a warning.

Please don’t write documentation in Markdown.

Conclusion

Communicating with words is a much different skill than transferring communicating with design. In the process of producing documentation however, they are two sides of the same coin. We have to both write and display information for users, and make it easy for them to understand it.

As an author, you should only need to care about communicating knowledge with words. Writing with semantic meaning allows you to properly seperate communcation with words and design.

You should write in a format that gives you the most semantic meaning possible. This:

Allows you to focus on communicating information, not thinking about what HTML class you need for a concept
Expand your own ability to think about your writing in terms of semantic nouns, allowing you to better structure your thoughts
Allows tooling to raise errors when you try to reference semantic concepts that doesn’t exist (typos, etc.)
Give people updating your documents explicit information about what you’re documenting
Allows your documentation systems to crosslink information and provide a better experience for your user
Allows your designer to apply consistent styles to all types of information

When you have the ability to write with powerful semantic constructs, writing becomes easier and more powerful. If you want to be the most efficient and useful writer, you write in a way that preserves the most of your mental model while writing. You write with a tool that gives you semantic meaning.

October 06, 2016 12:00 AM

October 05, 2016

François Dion

Something For Your Mind, Polymath Podcast episode 2

A is for Anomaly

In this episode, "A is for Anomaly", our first of the alphabetical episodes, we cover financial fraud, the Roman quaestores, outliers, PDFs and EKGs. Bleep... Bleep... Bleep...

"so perhaps this is not the ideal way of keeping track of 15 individuals..."

Something for your mind is available on

art·chiv.es

/'ärt,kīv/

at artchiv.es/s4ym/

Francois Dion
@f_dion

P.S. There is a bit more detail on this podcast as a whole, on linkedin.

October 05, 2016 05:25 PM

Vasudev Ram

Get names and types of a Python module's attributes

By Vasudev Ram

Hi readers,

Today I thought of this simple Python utility while using introspection to look at some modules.

It looks at a module, and for each attribute in it, it tells you the name and type of the attribute. This is useful if you are exploring some new Python module (built-in or third-party), and you want, for example, to know all the functions or methods in it, so that you can further introspect those by printing their docstrings, using the form:

print(module_name.function_or_method_name.__doc__)

because the docstring of a Python function of method, if present, is a nice capsule summary of: its arguments, what it does, and its return value (i.e. its input, processing and output). So with such a docstring, in many cases, a reasonably experienced programmer may not even need to look up the actual Python docs for that function or method, before beginning to use it, thereby saving their time.

So here is the utility:

from __future__ import print_function

# mod_attrs_and_types.py 
# Purpose: To show the attribute names and types 
# of a Python module, to help with learning about it.
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: http://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram

import sys

def attrs_and_types(mod_name):

    print('Attributes and their types for module {}:'.format(mod_name))
    print()
    for num, attr in enumerate(dir(eval(mod_name))):
        print("{idx}: {nam:30}  {typ}".format(
            idx=str(num + 1).rjust(4),
            nam=(mod_name + '.' + attr).ljust(30), 
            typ=type(eval(mod_name + '.' + attr))))

attrs_and_types(sys.__name__)

Running it like this:

$ python mod_attrs_and_types.py > out

gave this output:

Attributes and their types for module sys:

   1: sys.__displayhook__             <type 'builtin_function_or_method'>
   2: sys.__doc__                     <type 'str'>
   3: sys.__egginsert                 <type 'int'>
   4: sys.__excepthook__              <type 'builtin_function_or_method'>
   5: sys.__name__                    <type 'str'>
   6: sys.__package__                 <type 'NoneType'>
   7: sys.__plen                      <type 'int'>
   8: sys.__stderr__                  <type 'file'>
   9: sys.__stdin__                   <type 'file'>
  10: sys.__stdout__                  <type 'file'>
  11: sys._clear_type_cache           <type 'builtin_function_or_method'>
  12: sys._current_frames             <type 'builtin_function_or_method'>
  13: sys._getframe                   <type 'builtin_function_or_method'>
  14: sys._mercurial                  <type 'tuple'>
  15: sys.api_version                 <type 'int'>
  16: sys.argv                        <type 'list'>
  17: sys.builtin_module_names        <type 'tuple'>
  18: sys.byteorder                   <type 'str'>
  19: sys.call_tracing                <type 'builtin_function_or_method'>
  20: sys.callstats                   <type 'builtin_function_or_method'>
  21: sys.copyright                   <type 'str'>
  22: sys.displayhook                 <type 'builtin_function_or_method'>
  23: sys.dllhandle                   <type 'int'>
  24: sys.dont_write_bytecode         <type 'bool'>
  25: sys.exc_clear                   <type 'builtin_function_or_method'>
  26: sys.exc_info                    <type 'builtin_function_or_method'>
  27: sys.exc_type                    <type 'NoneType'>
  28: sys.excepthook                  <type 'builtin_function_or_method'>
  29: sys.exec_prefix                 <type 'str'>
  30: sys.executable                  <type 'str'>
  31: sys.exit                        <type 'builtin_function_or_method'>
  32: sys.flags                       <type 'sys.flags'>
  33: sys.float_info                  <type 'sys.float_info'>
  34: sys.float_repr_style            <type 'str'>
  35: sys.getcheckinterval            <type 'builtin_function_or_method'>
  36: sys.getdefaultencoding          <type 'builtin_function_or_method'>
  37: sys.getfilesystemencoding       <type 'builtin_function_or_method'>
  38: sys.getprofile                  <type 'builtin_function_or_method'>
  39: sys.getrecursionlimit           <type 'builtin_function_or_method'>
  40: sys.getrefcount                 <type 'builtin_function_or_method'>
  41: sys.getsizeof                   <type 'builtin_function_or_method'>
  42: sys.gettrace                    <type 'builtin_function_or_method'>
  43: sys.getwindowsversion           <type 'builtin_function_or_method'>
  44: sys.hexversion                  <type 'int'>
  45: sys.long_info                   <type 'sys.long_info'>
  46: sys.maxint                      <type 'int'>
  47: sys.maxsize                     <type 'int'>
  48: sys.maxunicode                  <type 'int'>
  49: sys.meta_path                   <type 'list'>
  50: sys.modules                     <type 'dict'>
  51: sys.path                        <type 'list'>
  52: sys.path_hooks                  <type 'list'>
  53: sys.path_importer_cache         <type 'dict'>
  54: sys.platform                    <type 'str'>
  55: sys.prefix                      <type 'str'>
  56: sys.py3kwarning                 <type 'bool'>
  57: sys.setcheckinterval            <type 'builtin_function_or_method'>
  58: sys.setprofile                  <type 'builtin_function_or_method'>
  59: sys.setrecursionlimit           <type 'builtin_function_or_method'>
  60: sys.settrace                    <type 'builtin_function_or_method'>
  61: sys.stderr                      <type 'file'>
  62: sys.stdin                       <type 'file'>
  63: sys.stdout                      <type 'file'>
  64: sys.subversion                  <type 'tuple'>
  65: sys.version                     <type 'str'>
  66: sys.version_info                <type 'sys.version_info'>
  67: sys.warnoptions                 <type 'list'>
  68: sys.winver                      <type 'str'>

There are other ways to do this, such as using the inspect module, but this is an easy way without inspect.

You can (e)grep for the pattern 'function|method' in the output, to get only the lines you want:

(If you haven't earlier, also check min_fgrep: minimal fgrep command in D.)

$ grep -E "function|method" out

1: sys.__displayhook__             <type 'builtin_function_or_method'>
   4: sys.__excepthook__              <type 'builtin_function_or_method'>
  11: sys._clear_type_cache           <type 'builtin_function_or_method'>
  12: sys._current_frames             <type 'builtin_function_or_method'>
  13: sys._getframe                   <type 'builtin_function_or_method'>
  19: sys.call_tracing                <type 'builtin_function_or_method'>
  20: sys.callstats                   <type 'builtin_function_or_method'>
  22: sys.displayhook                 <type 'builtin_function_or_method'>
  25: sys.exc_clear                   <type 'builtin_function_or_method'>
  26: sys.exc_info                    <type 'builtin_function_or_method'>
  28: sys.excepthook                  <type 'builtin_function_or_method'>
  31: sys.exit                        <type 'builtin_function_or_method'>
  35: sys.getcheckinterval            <type 'builtin_function_or_method'>
  36: sys.getdefaultencoding          <type 'builtin_function_or_method'>
  37: sys.getfilesystemencoding       <type 'builtin_function_or_method'>
  38: sys.getprofile                  <type 'builtin_function_or_method'>
  39: sys.getrecursionlimit           <type 'builtin_function_or_method'>
  40: sys.getrefcount                 <type 'builtin_function_or_method'>
  41: sys.getsizeof                   <type 'builtin_function_or_method'>
  42: sys.gettrace                    <type 'builtin_function_or_method'>
  43: sys.getwindowsversion           <type 'builtin_function_or_method'>
  57: sys.setcheckinterval            <type 'builtin_function_or_method'>
  58: sys.setprofile                  <type 'builtin_function_or_method'>
  59: sys.setrecursionlimit           <type 'builtin_function_or_method'>
  60: sys.settrace                    <type 'builtin_function_or_method'>

You can also (e)grep for a pattern or for alternative patterns:

$ grep -E "std(in|out)" out
   9: sys.__stdin__                   <type 'file'>
  10: sys.__stdout__                  <type 'file'>
  62: sys.stdin                       <type 'file'>
  63: sys.stdout                      <type 'file'>

The image at the top of the post is of a replica of a burning glass owned by Joseph Priestley, in his laboratory. If you don't remember your school physics, he is credited with having discovered oxygen.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Subscribe to my blog by email

My ActiveState recipes

Managed WordPress Hosting by FlyWheel

Share |

October 05, 2016 04:27 AM

October 04, 2016

Doing Math with Python

Trying out the code on Ubuntu 16.04

If you are using Ubuntu 16.04 and don't want to install the Anaconda Python distribution for trying out the book's programs or the sample solutions, this post is for you.

Ubuntu 16.04 already comes with Python 3 installed, so we only need to install the following packages - matplotlib, matplotlib-venn, sympy and idle3.

Open a terminal and do the following:

$ sudo apt-get update
$ sudo apt-get install python3-matplotlib python3-matplotlib-venn python3-sympy idle3

It's worth noting that this will install sympy 0.7.6 and matplotlib 1.5.1 which are both sufficient for the book's programs.

Starting IDLE editor

You can now start the IDLE editor by typing in "idle3" from the terminal and then it's ready for your programs!

Contact

If you find any issues please email me at doingmathwithpython@gmail.com or post your query/tip to any of the following community forums:

October 04, 2016 09:39 PM

Weekly Python Chat

Truthiness

Let's talk about truthiness in Python! Is the opposite of truthiness... falseyness? What is it, why does it matter, and how can you use it?

October 04, 2016 08:30 PM

Planet Python

October 10, 2016

October 09, 2016

What to care about

Brightness

LCD

OLED

Colour gamut

Formats

Where to go for advice

Ultra HD Premium

Why I chose my TV

What I'm announcing

The story behind Which Film

One fateful night ...

You need an API, IMDb

What platform?

Choosing a programming language is never easy for me

But why six years?!?

October 08, 2016

Summary

Brief Introduction

Interview with Lorena Mesa

Keep In Touch

Picks

Links

October 07, 2016

First Question: What about Using Excel or Word?

Question 2: What about Scraping PDFs?

Question 3: Helper Libraries

Comment 1: Back to Scraping

Question 4: About XML Parsing

Question 5: Protecting Bookmarks

Comment 2: Weasyprint

Question 6: unoconv in a Web Server

Comment 3: Converting from Word

Install pillow

Opening Images

Getting Image Information

Cropping Images

Using Filters

Wrapping Up

Related Reading

October 06, 2016

Pro AV on Linux

When

Supported Components

Defining The Terms

Sync vs Async

Concurrency and Parallelism

Quick Recap

Threads & Processes

Threads

Global Interpreter Lock (GIL)

Processes

The concurrent.futures module

Asyncio - Why, What and How?

Why do we need asyncio?

What is asyncio?

How do we use asyncio?

Making the Right Choice

Semantics

Value of Semantic Documentation

Conclusion

October 05, 2016

A is for Anomaly

October 04, 2016

Starting IDLE editor

Contact

The `concurrent.futures` module