Planet Python
Last update: October 08, 2016 09:47 PM
October 08, 2016
Podcast.__init__
Episode 78 - Lorena Mesa
Summary
One of the great strengths of the Python community is the diversity of backgrounds that our practitioners come from. This week Lorena Mesa talks about how her focus on political science and civic engagement led her to a career in software engineering and data analysis. In addition to her professional career she founded the Chicago chapter of PyLadies, helps teach women and kids how to program, and was voted onto the board of the PSF.
Brief Introduction
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
- Check out our sponsor Linode for running your awesome new Python apps. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next project
- You want to make sure your apps are error-free so give our other sponsor, Rollbar, a look. Rollbar is a service for tracking and aggregating your application errors so that you can find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan.
- Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
- By leaving a review on iTunes, or Google Play Music it becomes easier for other people to find us.
- Join our community! Visit discourse.pythonpodcast.com to help us grow and connect our wonderful audience.
- Your host as usual is Tobias Macey
- Today we’re interviewing Lorena Mesa about what inspires her in her work as a software engineer and data analyst.
I’m excited to tell you about a new sponsor of the show, Rollbar.
One of the frustrating things about being a developer, is dealing with errors… (sigh)
- Relying on users to report errors
- Digging thru log files trying to debug issues
- A million alerts flooding your inbox ruining your day…
With Rollbar’s full-stack error monitoring, you get the context, insights and control you need to find and fix bugs faster. It’s easy to get started tracking the errors and exceptions in your stack.You can start tracking production errors and deployments in 8 minutes - or less, and Rollbar works with all major languages and frameworks, including Ruby, Python, Javascript, PHP, Node, iOS, Android and more.You can integrate Rollbar into your existing workflow such as sending error alerts to Slack or Hipchat, or automatically create new issues in Github, JIRA, Pivotal Tracker etc.
We have a special offer for Podcast.__init__ listeners. Go to rollbar.com/podcastinit, signup, and get the Bootstrap Plan free for 90 days. That’s 300,000 errors tracked for free.Loved by developers at awesome companies like Heroku, Twilio, Kayak, Instacart, Zendesk, Twitch and more. Help support Podcast.__init__ and give Rollbar a try a today. Go to rollbar.com/podcastinit
Interview with Lorena Mesa
- Introductions
- How did you get introduced to Python?
- How did your original interests in political science and community outreach lead to your current role as a software engineer?
- You dedicate a lot of your time to organizations that help teach programming to women and kids. What are some of the most meaningful experiences that you have been able to facilitate?
- Can you talk a bit about your work getting the PyLadies chapter in Chicago off the ground and what the reaction has been like?
- Now that you are a member of the board for the PSF, what are your goals in that position?
- What is it about software development that made you want to change your career path?
- What are some of the most interesting projects that you have worked on, whether for your employer or for fun?
- Do you think that the bootcamp you attended did a good job of preparing you for a position in industry?
- What is your view on the concept that software development is the modern form of literacy? Do you think that everyone should learn how to program?
Keep In Touch
Picks
Links
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Summary One of the great strengths of the Python community is the diversity of backgrounds that our practitioners come from. This week Lorena Mesa talks about how her focus on political science and civic engagement led her to a career in software engineering and data analysis. In addition to her professional career she founded the Chicago chapter of PyLadies, helps teach women and kids how to program, and was voted onto the board of the PSF.Brief IntroductionHello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.Check out our sponsor Linode for running your awesome new Python apps. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next projectYou want to make sure your apps are error-free so give our other sponsor, Rollbar, a look. Rollbar is a service for tracking and aggregating your application errors so that you can find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan.Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.By leaving a review on iTunes, or Google Play Music it becomes easier for other people to find us.Join our community! Visit discourse.pythonpodcast.com to help us grow and connect our wonderful audience.Your host as usual is Tobias MaceyToday we're interviewing Lorena Mesa about what inspires her in her work as a software engineer and data analyst. Use the promo code podcastinit20 to get a $20 credit when you sign up! I’m excited to tell you about a new sponsor of the show, Rollbar. One of the frustrating things about being a developer, is dealing with errors… (sigh)Relying on users to report errorsDigging thru log files trying to debug issuesA million alerts flooding your inbox ruining your day...With Rollbar’s full-stack error monitoring, you get the context, insights and control you need to find and fix bugs faster. It's easy to get started tracking the errors and exceptions in your stack.You can start tracking production errors and deployments in 8 minutes - or less, and Rollbar works with all major languages and frameworks, including Ruby, Python, Javascript, PHP, Node, iOS, Android and more.You can integrate Rollbar into your existing workflow such as sending error alerts to Slack or Hipchat, or automatically create new issues in Github, JIRA, Pivotal Tracker etc. We have a special offer for Podcast.__init__ listeners. Go to rollbar.com/podcastinit, signup, and get the Bootstrap Plan free for 90 days. That's 300,000 errors tracked for free.Loved by developers at awesome companies like Heroku, Twilio, Kayak, Instacart, Zendesk, Twitch and more. Help support Podcast.__init__ and give Rollbar a try a today. Go to rollbar.com/podcastinitInterview with Lorena MesaIntroductionsHow did you get introduced to Python?How did your original interests in political science and community outreach lead to your current role as a software engineer?You dedicate a lot of your time to organizations that help teach programming to women and kids. What are some of the most meaningful experiences that you have been able to facilitate?Can you talk a bit about your work getting the PyLadies chapter in Chicago off the ground and what the reaction has been like?Now that you are a member of the board for the PSF, what are your goals in that position?What is it about software development that made you want to change your career path?What are some of the most interesting projects that you have worked on, whether for your employer or for fun?Do you think that the bootcamp you attended did a good job of preparing you for a position in industry?What is your view on the concept that software development is the modern form ofOctober 07, 2016
Weekly Python StackOverflow Report
(xl) stackoverflow python report
These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2016-10-07 19:11:06 GMT
- How to make an integer larger than any other integer? - [38/7]
- Is there a more Pythonic way to combine an Else: statement and an Except:? - [25/6]
- Why does 1 // 0.1 == 9.0? - [9/1]
- Find the subset of a set of integers that has the maximum product - [8/4]
- How can I write a C function that takes either an int or a float? - [8/3]
- Entries are mirrored throughout list - [8/0]
- Is it possible to def a function with a dotted name in Python? - [7/2]
- Organizing list of tuples - [6/7]
- cryptography AssertionError: sorry, but this version only supports 100 named groups - [6/1]
- Global name 'camera' is not defined in python - [6/1]
Philip Semanchuk
Creating PDF Documents Using LibreOffice and Python, Part 3
This is part 3 of a 4-part series on creating PDFs using LibreOffice. You should read part 1 and part 2 if you haven’t already. This series is a supplement to a talk I gave at PyOhio 2016.
Here in part 3, I review the conversation we (the audience and I) had at the end of the PyOhio talk. I committed the speaker’s cardinal sin of not repeating (into the microphone) the questions people asked, so they’re inaudible in the video. In addition, we had some interesting conversations among multiple people that didn’t get picked up by the microphone. I don’t want them to get lost, so I summarized them here.
The most interesting thing I learned out of this conversation is that LibreOffice can open PDFs; once opened they’re like an ordinary LibreOffice document. You can edit them, save them to ODF, export to PDF, etc. Is this cool, or what?
First Question: What about Using Excel or Word?
One of the attendees jumped in to confirm that modern MS Word formats are XML-based. However, he went on to say, the XML contains a statement at the top that says something like “You cannot legally read the rest of this file”. I made a joke about not having one’s lawyer present when reading the file.
In all seriousness, I can’t find anything online that suggests that Microsoft’s XML contains a warning like that, and the few examples I looked at didn’t have have any such warning. If you can shed any light on this, please do so in the comments!
We also discussed the fact that one must invoke the office app (LibreOffice or Word, Excel, etc.) in order to render the document to PDF. LibreOffice has a reputation for performing badly when invoked repeatedly for this purpose. LibreOffice 5 may have addressed some of these problems, but as of this writing it’s still pretty new so the jury is still out on how this will work in practice.
Another attendee noted that Microsoft can save to LibreOffice format, so if Word (or Excel) is your document-editing tool of choice, you can still use LibreOffice to render it to PDF. That’s really useful if MS Office is your tool of choice but you’re doing rendering on a BSD/Linux server.
Question 2: What about Scraping PDFs?
The questioner noted that scraping a semi-complex PDF is very painful. It’d be ideal, he said, to be able to take a complex form like the 1040 and extract key value pairs of the question and answer. Is the story getting better for scraping PDFs?
My answer was that for the little experience I have with scraping PDFs, I’ve used PDFMiner, and the attendee said he was using the same.
Someone else chimed in that it’s a great use case for [Amazon’s] Mechanical Turk; in his case he was dealing with old faxes that had been scanned.
Question 3: Helper Libraries
Matt Wilson asked if it would make sense to begin building helper libraries to simplify common tasks related to manipulating LibreOffice XML. My answer was that I wasn’t sure since each project has very specific needs. Someone else suggested that one would have to start learning the spec in order to begin creating abstractions.
In the YouTube comments, Paul Hoffman1 called our attention to OdfPy a “thin abstraction over direct XML access”. It looks quite interesting.
Comment 1: Back to Scraping
One of the attendees commented that he had used Jython and PDFBox for PDF scraping. “It took a lot to get started, but once I started to figure out my way around it, it was a pretty good tool and it moved pretty speedily as compared to some of the other tools I used.” He went on to say that it was pretty complete and that it worked very well.
Question 4: About XML Parsing
The question was what I used to parse the XML, and my answer was that I used ElementTree from the standard library. Your favorite XML parsing library will work just fine.
Question 5: Protecting Bookmarks
The question was whether or not I did anything special to protect the bookmarks in the document. My answer was that I didn’t. (I’m not even sure it’s possible.) If you go through multiple rounds of editing with your client, those invisible bookmarks are inevitably going to get moved or deleted, so expect a little maintenance work related to that.
Comment 2: Weasyprint
One of the attendees commented that Weasyprint is a useful HTML/CSS to PDF converter. My observation was that tools in this class (HTML/CSS to PDF converters) are not as precise as either of the methods I outlined in this talk, but if you don’t need precision they’re probably a nice way to go.
Question 6: unoconv in a Web Server
Can one use unoconv in a Web server? My answer was that it’s possible, but it’s not practical to use it in-process. For me, it worked to do so in a demo of an intranet application, but that’s about as far as you want to go with it. It’s much more practical to use a distributed processing application (Celery, for example).
One of the attendees concurred that it makes sense to spin it off into a separate process, but “unoconv inexplicably crashes when it feels like it”.
Comment 3: Converting from Word
The initial comment was that pandoc might help with converting from Word to LibreOffice. This started a conversation which I’d summarize this way:
- LibreOffice can open MS Office docs, so use that instead of pandoc and save as LibreOffice
- If you open MS Office documents with LibreOffice, double check the formatting because it doesn’t always survive the transition
- LibreOffice can open PDFs for editing.
Mike Driscoll
An Intro to the Python Imaging Library / Pillow
The Python Imaging Library or PIL allowed you to do image processing in Python. The original author, Fredrik Lundh, wrote one of my favorite Python blogs when I first started learning Python. However PIL’s last release was way back in 2009 and the blog also stopped getting updated. Fortunately, there were some other Python folks that came along and forked PIL and called their project pillow. The pillow project is a drop-in replacement for PIL that also supports Python 3, something PIL never got around to doing.
Please note that you cannot have both PIL and pillow installed at the same time. There are some warnings in their documentation that list some differences between PIL and pillow that get updated from time to time, so I’m just going to direct you there instead of repeating them here since they will likely become out of date.
Install pillow
You can install pillow using pip or easy_install. Here’s an example using pip:
pip install Pillow
Note that if you are on Linux or Mac, you may need to run the command with sudo.
Opening Images

Pillow makes it easy to open an image file and display it. Let’s take a look:
from PIL import Image image = Image.open('/path/to/photos/jelly.jpg') image.show()
Here we just import the Image module and ask it to open our file. If you go and read the source, you will see that on Unix, the open method saves the images to a temporary PPM file and opens it with the xv utility. On my Linux machine, it opened it with ImageMagick, for example. On Windows, it will save the image as a temporary BMP and open it in something like Paint.
Getting Image Information
You can get a lot of information about an image using pillow as well. Let’s look at just a few small examples of what we can extract:
>>> from PIL import Image >>> image = Image.open('/path/to/photos/jelly.jpg') >>> r, g, b = image.split() >>> histogram = image.histogram() [384761, 489777, 557209, 405004, 220701, 154786, 55807, 35806, 21901, 16242] >>> exif = image._getexif() exif {256: 1935, 257: 3411, 271: u'Panasonic', 272: u'DMC-LX7', 274: 1, 282: (180, 1), 283: (180, 1), 296: 2, 305: u'PaintShop Pro 14.00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 306: u'2016:08:21 07:54:57', 36867: u'2016:08:21 07:54:57', 36868: u'2016:08:21 07:54:57', 37121: '\x01\x02\x03\x00', 37122: (4, 1), 37381: (124, 128), 37383: 5, 37384: 0, 37385: 16, 37386: (47, 10), 40960: '0100', 40961: 1, 40962: 3968, 40963: 2232, 41495: 2, 41728: '\x03', 41729: '\x01', 41985: 0, 41986: 0, 41987: 0, 41988: (0, 10), 41989: 24, 41990: 0, 41991: 0, 41992: 0, 41993: 0, 41994: 0}
In this example, we show how to extract the RGB (red, green, blue) values from the image. We also learn how to get a histrogram from the image. Note that I truncated the output a bit as the histogram’s output was much larger. You could graph the histrogram using another Python package, such as matplotlib. Finally, the example above demonstrates how to extract the EXIF information from the image. Again, I have shortened the output from this method a bit as it contained way too much information for this article.
Cropping Images
You can also crop images with pillow. It’s actually quite easy, although it may take you a little trial and error to figure it out. Let’s try cropping our jellyfish photo:
>>> from PIL import Image >>> image = Image.open('/path/to/photos/jelly.jpg') >>> cropped = image.crop((0, 80, 200, 400)) >>> cropped.save('/path/to/photos/cropped_jelly.png')
You will note that all we need to do is open the image and then call its crop method. You will need to pass in the x/y coordinates that you want to crop too, i.e. (x1, y1, x2, y2). In pillow, the 0 pixel is the top left pixel. As you increase your x value, it goes to the right. As you increase the y value, you go down the image. When you run the code above, you’ll end up with the following image:

That’s a pretty boring crop. I want to crop the jellyfish’s “head”. To get the right coordinates quickly, I used Gimp to help me figure out what coordinates to use for my next crop.
>>> from PIL import Image >>> image = Image.open('/path/to/photos/jelly.jpg') >>> cropped = image.crop((177, 882, 1179, 1707)) >>> cropped.save('/path/to/photos/cropped_jelly2.png')
If we run this code, we’ll end up with the following cropped version:

That’s much better!
Using Filters
Original Jellyfish
There are a variety of filters that you can use in pillow to apply to your images. They are contained in the ImageFilter module. Let’s look at a couple of them here:
>>> from PIL import ImageFilter >>> from PIL import Image >>> image = Image.open('/path/to/photos/jelly.jpg') >>> blurred_jelly = image.filter(ImageFilter.BLUR) >>> blurred_jelly.save('/path/to/photos/blurry_jelly.png')
This will blur the jellyfish photo slightly. Here’s the result I got:
Blurry Jellyfish
Of course, most people like their images sharper rather than blurrier. Pillow has your back. Here’s one way to sharpen the image:
>>> from PIL import ImageFilter >>> from PIL import Image >>> image = Image.open('/path/to/photos/jelly.jpg') >>> blurred_jelly = image.filter(ImageFilter.SHARPEN) >>> blurred_jelly.save('/path/to/photos/sharper_jelly.png')
When you run this code, you’ll end up with the following:
Sharper Jellyfish
You can also use the ImageEnhance module for sharpening your photos, among other things.
There are other filters that you can apply too, such as DETAIL, EDGE_ENHANCE, EMBOSS, SMOOTH, etc. You can also write your code in such a way that you can apply multiple filters to your image.
You will likely need to download the images above to really be able to compare the differences in the filters.
Wrapping Up
You can do much more with the pillow package than what is covered in this short article. Pillow supports image transforms, processing bands, image enhancements, the ability to print your images and much more. I highly recommend reading the pillow documentation to get a good grasp of everything you can do.
Related Reading
- Python pillow official website
- Pillow documentation
- Wikipedia article on the Python Imaging Library
- PyPI: PIL
Talk Python to Me
#79 Beeware Python Tools
Could you write me a Python app for the wide range of platforms out there? Oh, wait, I want them to be native GUI applications. And I need them on mobile (Android, iOS, tvOS, and watchOS) as well as major desktop apps. I also need them to appear indistinguishable from native apps (be a .app on macOS, .exe on Windows, etc). <br/> <br/> What technology would you use for this? This week I'll introduce you to a wide set of small, focused and powerful tools that make all of this, and more, possible. We're speaking with Russell Keith-Magee, founder of the Beeware project. <br/> <br/> Links from the show: <br/> <div style="font-size: .85em;"> <br/> <b>Beeware Project</b>: <a href='http://pybee.org/' target='_blank'>pybee.org</a> <br/> <b>Russell Keith-Magee</b>: <a href='https://twitter.com/freakboy3742' target='_blank'>@freakboy3742</a> <br/> <b>Beeware on GitHub</b>: <a href='https://github.com/pybee' target='_blank'>github.com/pybee</a> <br/> <br/> <b>Organizing areas of the project</b> <br/> <b>Tools</b>: <a href='http://pybee.org/project/projects/tools/' target='_blank'>pybee.org/project/projects/tools</a> <br/> <b>Applications</b>: <a href='http://pybee.org/project/projects/applications/' target='_blank'>pybee.org/project/projects/applications</a> <br/> <b>Libraries</b>: <a href='http://pybee.org/project/projects/libraries/' target='_blank'>pybee.org/project/projects/libraries</a> <br/> <b>Bridges</b>: <a href='http://pybee.org/project/projects/bridges/' target='_blank'>pybee.org/project/projects/bridges</a> <br/> <b>Templates</b>: <a href='http://pybee.org/project/projects/templates/' target='_blank'>pybee.org/project/projects/templates</a> <br/> <b>Support</b>: <a href='http://pybee.org/project/projects/support/' target='_blank'>pybee.org/project/projects/support</a> <br/> </div>
October 06, 2016
Robin Parmar
How to retrieve photo metadata in Python
You'd think it would be easy to retrieve and even edit photo metadata. After all, we are living in the 21st century. But, no, some things prove more difficult than they should. A search for applications turned up quite a few that would display the metadata but none that would easily edit it.
OK, so there's always the programmatic approach. And for that I turn to Python. Let's see what the state of the art holds for us. (Hint: It's a bumpy ride.)
First, a couple of constraints. I develop on a Windows 10 machine, largely because that's the same computer all my other goodies are on. Yes, LINUX might be better for development, but not for desktop use. (Cue old debate.)
Second, because I am indeed living in this century, I prefer to use Python 3, the version that broke backwards compatibility. It's been around for 8 years, so is not exactly new.
What is metadata?
Metadata is simply a list of strings stored in an image file. These strings are carried along with the image, and can identify the author, camera characteristics, copyright information, and so on. There are two main metadata standards.
Exif, the Exchangeable image file format, has been around since 1995. It works with media such as WAV sound files, TIFF images, and JPG.
IPTC, defined by the International Press Telecommunications Council, is designed to standardise data for news gathering and journalism. There are two main parts, IPTC Core and IPTC Extension.
The remainder of this article will investigate methods of reading this data, in Python.
Take a PIL
When we think of images and Python we think first of the Python Imaging Library (PIL). Or, rather, it's more current fork, named Pillow.
You can install this useful library using the simple mechanism of typing at the command line:
pip install PillowThis works across platforms.
In fact, if pip fails, I usually give up right away. Not because there aren't other ways to install. But if pip fails, it is a good indication that the library is not well maintained. As we shall see.
In any case, here's my test code. It relies on the fact I have defined a path to a good test file.
fn = 'path/to/some/file/tester.jpg'
def test_PIL():
# test PIL
from PIL import Image
from PIL.ExifTags import TAGS
print( '\n Test of PIL >> \n' )
img = Image.open(fn)
info = img._getexif()
for k, v in info.items():
nice = TAGS.get(k, k)
print( '%s (%s) = %s' % (nice, k, v) )
Interrogating the image for Exif information returns a dictionary. We can iterate over this to see all the meta-tags. In this case a useful TAGS dictionary converts the numeric keys to English equivalents. So, instead of wondering what tag 315 means, we know that it is "artist".
Unfortunately, with my test data I noticed problems. (My programme output is at the bottom of this post, for convenience.) First, the "copyright" field contained scrambled text. Second, the "comment" field did not show up at all. This could perhaps be because Pillow reports only Exif and not IPTC. In any case, it is insufficient and unreliable.
Some dead ends
At this point I did a web search and came up with several likely candidates. But they soon proved frustrating.
The library pyexiv2 is deprecated in favour of GExiv2, part of the Gnome set and hence without a Windows installer nor any way to easily compile.
IPTCInfo is recommended in certain blog articles, like this one, already out-dated, though only four years old.
The automatic install for IPTCInfo failed, so I checked and discovered that the last code update was back in 2011. As a single module, it was easy enough to install manually. But then I discovered that it was not at all Python 3 compatible. My attempts to change the code manually ended in failure.
A piece of the Piexif
Piexif has been tested across platforms and has no dependencies. The documentation is a bit terse, but helpfully indicates that the main "load" function returns several dictionaries, plus a byte dump that forms a thumbnail. I wrote my code to avoid this.
def test_piexif():
# test Piexif
import piexif
print( '\n Test of Piexif >>' )
data = piexif.load(fn)
for key in ['Exif', '0th', '1st', 'GPS', 'Interop']:
subdata = data[key]
print( '\n%s:' % key )
for k, v in subdata.items():
print( '%s = %s' % (k, v) )
I really don't know what "0th" and "1st" mean as dictionary names, but it does appear that I get out all of the meta tags I expect. In particular, the tag marked 37510 contains my comment.
Like PIL, this library has a dictionary to map the obscure codes to names. I thought I should interrogate this.
def test_piexif_inspect():
# display all metadata names
import piexif
print( '\n Inspect piexif >>\n' )
info = piexif.ImageIFD.__dict__
l = ['%s = %s' % (v, k) for k, v in info.items()]
l.sort()
for item in l:
print(item)
The result is missing a mapping for tag 37510, the very one I want to use!
OK, not such a big deal in this case. But what if I start using other tags and have to decipher the codes manually? Rather annoying.
You will also notice an odd encoding problem. Rather than contain my comment as is, the tag reads...
b'ASCII\x00\x00\x00MY TEST COMMENT!'The b marks the string as binary, which is some odd Python 2 designation. The smart thing to do is decode this to a proper code page, but then we have the prefix cruft.
The following will do the trick, but I am again disliking the arbitrary nature of this decoding.
def test_piexif_use():
import piexif
print( '\n Usage of piexif >>' )
data = piexif.load(fn)
exif = data['Exif']
comment = exif.get(37510, '').decode('UTF-8')
comment = comment[8:]
print( comment )
Try exifread
Finally, I stumbled upon the library exifread.
Here again is my test script. As before, I skip past some tags that are going to be long boring byte strings. And I progress in sorted order, just for convenience.
def test_exifread():
import exifread
print( '\n Test of exifread >>\n' )
with open(fn, 'rb') as f:
exif = exifread.process_file(f)
for k in sorted(exif.keys()):
if k not in ['JPEGThumbnail', 'TIFFThumbnail', 'Filename', 'EXIF MakerNote']:
print( '%s = %s' % (k, exif[k]) )
The result? All of the tags I expect are present, in human-readable encoding. It seems that this obscure project is the winner. Some of the more popular libraries need to do some catching up!
Though, one big limitation exists even here. This library does not support editing the tags. For that, you will need to use one of the previous choices and work around the cruft.
Nonetheless, I hope this article saves you the time I unfortunately spent.
Output
Here follows my test output, for reference:
Test of PIL >>
ExifVersion (36864) = b'0230'
ShutterSpeedValue (37377) = (9965784, 1000000)
ExifImageWidth (40962) = 600
DateTimeOriginal (36867) = 2011:06:09 01:20:59
DateTimeDigitized (36868) = 2011:06:09 01:20:59
MaxApertureValue (37381) = (0, 256)
SceneCaptureType (41990) = 0
MeteringMode (37383) = 5
LightSource (37384) = 0
Flash (37385) = 24
FocalLength (37386) = (77, 1)
CFAPattern (41730) = b'\x02\x00\x02\x00\x00\x01\x01\x02'
Make (271) = OLYMPUS IMAGING CORP.
Model (272) = E-P1
Orientation (274) = 1
ExifImageHeight (40963) = 600
Contrast (41992) = 0
Copyright (33432) = Robin Parmar mar
ExposureBiasValue (37380) = (-3, 10)
XResolution (282) = (720000, 10000)
YResolution (283) = (720000, 10000)
ExposureTime (33434) = (1, 1000)
DigitalZoomRatio (41988) = (100, 100)
FocalLengthIn35mmFilm (41989) = 116
ExposureProgram (34850) = 3
ColorSpace (40961) = 65535
BodySerialNumber (42033) = H52502123
ResolutionUnit (296) = 2
WhiteBalance (41987) = 0
GainControl (41991) = 1
Software (305) = Adobe Photoshop CS5 Windows
DateTime (306) = 2011:08:22 21:39:05
LensMake (42035) = Pentax
LensModel (42036) = smc Pentax F A77 Limited
Saturation (41993) = 0
Artist (315) = Robin Parmar
Sharpness (41994) = 0
FileSource (41728) = b'\x03'
CustomRendered (41985) = 0
ExposureMode (41986) = 1
ExifOffset (34665) = 268
ISOSpeedRatings (34855) = 200
Test of Piexif >>
Exif:
36864 = b'0230'
37377 = (9965784, 1000000)
40962 = 600
36867 = b'2011:06:09 01:20:59'
36868 = b'2011:06:09 01:20:59'
37381 = (0, 256)
37510 = b'ASCII\x00\x00\x00MY TEST COMMENT!'
37383 = 5
37384 = 0
37385 = 24
37386 = (77, 1)
41988 = (100, 100)
41986 = 1
40963 = 600
37380 = (-3, 10)
41730 = b'\x02\x00\x02\x00\x00\x01\x01\x02'
33434 = (1, 1000)
41728 = b'\x03'
41989 = 116
34850 = 3
42033 = b'H52502123'
40961 = 65535
41990 = 0
34855 = 200
41987 = 0
41991 = 1
41992 = 0
42035 = b'Pentax'
42036 = b'smc Pentax F A77 Limited'
41993 = 0
41994 = 0
41985 = 0
0th:
283 = (720000, 10000)
296 = 2
34665 = 11444
306 = b'2011:08:22 21:39:05'
270 = b''
271 = b'OLYMPUS IMAGING CORP.'
272 = b'E-P1'
305 = b'Adobe Photoshop CS5 Windows'
274 = 1
33432 = b'Robin Parmar'
282 = (720000, 10000)
315 = b'Robin Parmar'
1st:
513 = 878
514 = 10416
259 = 6
296 = 2
282 = (72, 1)
283 = (72, 1)
GPS:
Interop:
Test of exifread >>
EXIF BodySerialNumber = H52502123
EXIF CVAPattern = [2, 0, 2, 0, 0, 1, 1, 2]
EXIF ColorSpace = Uncalibrated
EXIF Contrast = Normal
EXIF CustomRendered = Normal
EXIF DateTimeDigitized = 2011:06:09 01:20:59
EXIF DateTimeOriginal = 2011:06:09 01:20:59
EXIF DigitalZoomRatio = 1
EXIF ExifImageLength = 600
EXIF ExifImageWidth = 600
EXIF ExifVersion = 0230
EXIF ExposureBiasValue = -3/10
EXIF ExposureMode = Manual Exposure
EXIF ExposureProgram = Aperture Priority
EXIF ExposureTime = 1/1000
EXIF FileSource = Digital Camera
EXIF Flash = Flash did not fire, auto mode
EXIF FocalLength = 77
EXIF FocalLengthIn35mmFilm = 116
EXIF GainControl = Low gain up
EXIF ISOSpeedRatings = 200
EXIF LensMake = Pentax
EXIF LensModel = smc Pentax F A77 Limited
EXIF LightSource = Unknown
EXIF MaxApertureValue = 0
EXIF MeteringMode = Pattern
EXIF Saturation = Normal
EXIF SceneCaptureType = Standard
EXIF Sharpness = Normal
EXIF ShutterSpeedValue = 1245723/125000
EXIF UserComment = MY TEST COMMENT!
EXIF WhiteBalance = Auto
Image Artist = Robin Parmar
Image Copyright = Robin Parmar
Image DateTime = 2011:08:22 21:39:05
Image ExifOffset = 11444
Image ImageDescription =
Image Make = OLYMPUS IMAGING CORP.
Image Model = E-P1
Image Orientation = Horizontal (normal)
Image ResolutionUnit = Pixels/Inch
Image Software = Adobe Photoshop CS5 Windows
Image XResolution = 72
Image YResolution = 72
Thumbnail Compression = JPEG (old-style)
Thumbnail JPEGInterchangeFormat = 878
Thumbnail JPEGInterchangeFormatLength = 10416
Thumbnail ResolutionUnit = Pixels/Inch
Thumbnail XResolution = 72
Thumbnail YResolution = 72
François Dion
Improving your communications: Professional Audio-Video Production on Linux
Pro AV on Linux
![]() |
When
Francois Dion
@f_dion
Machinalis
First release of mypy-django
We’re happy to make a release of mypy-django 0.1.1, the first of many!

It’s a collection of type stubs for using with the mypy static type checking tool (and also with other PEP-484 compliant tools). If you’re a Django developer who wants to improve the quality of documentation and checking of your code, you might be interested. You can take a look at the README file for some examples, or at the annotated version of the Django tutorial for a full project.
Our goal is to be able to annotate (and validate annotations) in our Django projects, and have yet another tool in our software quality arsenal at Machinalis... But given that we’re releasing it under a BSD license, it’s also a tool for everyone else. Feel free to let us know if you’re using it or if we can help you integrate it with your projects.
Support is mainly for Django 1.10, but older versions should work reasonably well.
Supported Components
HttpRequestandHttpResponseobjects- Including supporting classes like
QueryDictand file objects
- Including supporting classes like
- Generic views
- URL resolver
- Other miscellaneous components required by the above (timezones, cookies, ...)
Github: https://github.com/machinalis/mypy-django
Documentation: Check the included README file
Twitter: @machinalis, @dmoisset
Richard Gomes
Strong type checking in Python
This article describes a Python annotation which combines documentation with type checking in order to help Python developers to gain better understanding and control of the code, whilst allowing them to catch mistakes on the spot, as soon as they occur.
Being a Java developer previously but extradited to Python by my own choice, I sometimes feel some nostalgy from the old times, when the Java compiler used to tell me all sorts of stupidities I used to do.
In the Python world no one is stupid obviously, except probably me who many times find myself passing wrong types of arguments by accident or by pure stupidity, in case you accept the hypothesis that there's any difference between the two situations.
When you are coding your own stuff, chances are that you know very well what is going on. In general, you have the entire bloody API alive and kicking inside your head. But when you are learning some third party software, in particular large frameworks, chances are that your code is called by something you don't understand very well, which decides to pass arguments to your code which you do not have a clue what they are about.
Documentation
Documentation is a good way of sorting out this difficulty. Up-to-date documentation, in particular, is the sort of thing I feel extremely happy when I have chance to find one. My mood is being constantly crunched these days, if you understand what I mean.
Outdated documentation is not only useless but also undesirable. Possibly for this reason some (or many?) people prefer no documentation at all, since absence of information is better than misinformation, they defend.
It's very difficult to keep documentation up-to-date, unless you are forced somehow to do so. Maybe at gun point?
Strong type checking
I'm not in the quest of convincing anyone that strong type checking is good or useful or desirable. Like everything in life, there are pros and cons.
On the other hand, I'd like to present a couple of benefits which keep strong type checking in my wishlist:
* I'd like to have the ability to stop the application as soon as a wrong type is received by a function or returned by a function to its caller. Stop early, catch mistakes easily, immediately, on spot.
* I'd like to identify and document argument types being passed by frameworks to my code, easily, quickly, effectively, without having to turn the Internet upside down every time I'm interested to learn what argument x is about.
Introducing sphinx_typesafe
Doing a bit of research, I found an interesting library called IcanHasTypeCheck (or ICHTC for short), which I ended up rewriting almost from scratch during the last revision and I've renamed it to sphinx_typesafe.
Let me explain the idea:
In the docstring of a function or method, you employ Sphinx-style documentation patterns in order to tell types associated to variables.
If your documentation is pristine, the number of arguments in the documentation match the number of arguments in the function or method definition.
If your logic is pristine, the types of arguments you documented match the types of arguments actually passed to the function or method at runtime, or returned by the function or method to the caller, at runtime.
You just need to add an annotation @typesafe before the function or method, and sphinx_typesafe checks if the documentation matches the definition.
If you don't have a clue about the type of an argument, simply guess some unlikely type, say: None. Then run the application and sphinx_typesafe will interrupt the execution of it and report that the actual type does not match None. The next step is obviously substitute None by the actual type.
Benefits
A small example tells more than several paragraphs.
Imagine that you see some code like this:
import math
def d(p1, p2):
x = p1.x - p2.x
y = p1.y - p2.y
return math.sqrt(x*x + y*y)
Imagine that you had type information about it, like this:
import math
from sphinx_typesafe import typesafe
@typesafe
def d(p1, p2):
"""
:type p1: shapes.Point
:type p2: shapes.Point
:rtype : float
"""
x = p1.x - p2.x
y = p1.y - p2.y
return math.sqrt(x*x + y*y)
Now you are able to understand what this code is about, quickly!.
In particular, you are able to tell what it is the domain of types this code is intended to operate on.
When you run this code, if this function receives a shapes.Square instead of a shape.Point, it would stop immediately. Notice that, eventually, a shape.Square may have components x and y which would make the function return wrong results silently. Imagine your test cases catching this situation!
So, I hope I demonstrated the two benefits I was interested on.
Missing Features
Polymorphism
Sometimes I would like to tell that an argument can be a file but also a str. At the moment I can say that the argument can be types.NotImplementedType meaning "any type". But I would like something more precise, like this:
:type f: [file, str]
This is not difficult to implement, actually, but we are not there yet.
Non intrusive
I would like to have a non intrusive way to turn on type checking and a very cheap way of turning off type checking, if possible without any code change.
Thinking more about use cases, I guess that type checking is very useful when you are developing and, in particular, when you are running your test suite. You are probably not interested on having the overhead of type checking on production code which was theoretically exhaustively tested.
Long story short, I would like to integrate sphinx_typesafe with pytest, so that an automatic decoration of functions and methods would happen automagically and without any code change.
If pytest finds a docstring which happens to contain a Sphinx-style type specification on it, @typesafe is applied to the function or method. That would be really nice! You could also run your code in production without type checking since type checking was never turned on in the first place.
The idea looks to be great, but my ignorance on pytest internals and my limited time prevents me of going ahead. Maybe in future!
Python3 support
The sources of sphinx_typesafe itself are ready for Python3, but sphinx_typesafe does not handle properly your sources written in Python3 yet. It's not difficult to implement, actually: it's just a matter of adjusting one function, but we are not there yet. Maybe you feel compelled to contribute?
More Information
https://pypi.python.org/pypi/sphinx_typesafe
Credits
Thank Klaas for inspiration and his IcanHasTypeCheck (or ICHTC for short).
Import Python
ImportPython Issue 93
Worthy Read sublime We have been sharing Daniel's articles and videos from this youtube channel https://www.youtube.com/channel/UCI0vQvr9aFn27yR6Ej6n5UA for a while now. Daniel Bader just published his book on Sublime Text for Python Developers. Have a look at his book if you are a sublime text user. Here is a 30% discount for all ImportPython Subscribers. flask , SQLAlchemy You can find lots of reasons to never delete records from your database. The Soft Delete pattern is one of the available options to implement deletions without actually deleting the data. It does it by adding an extra column to your database table(s) that keeps track of the deleted state of each of its rows. This sounds straightforward to implement, and strictly speaking it is, but the complications that derive from the use of soft deletes are far from trivial. In this article I will discuss some of these issues and how I avoid them in Flask and SQLAlchemy based applications. data visualization Comprehensive listing of all data visualization packages with small codesnippets. django Guilherme Caminha explores the utility of using on_commit hook available from 1.9 onwards in sequencing part of a time consuming task in django view and rest offloaded to an async process. Try Hired and get in front of 4,000+ companies with one application. No more pushy recruiters, no more dead end applications and mismatched companies, Hired puts the power in your hands. Sponsor async-io Lukasz Langa uses asyncio source code to explain the event loop, blocking calls, coroutines, tasks, futures, thread pool executors, and process pool executors. opensource project Click is my go to Python package for creating command line applications. click-man will generate one man page per command of your click CLI application specified in console_scripts in your setup.py. interview Bryan is a core developer of the Bokeh project, which is a visualization package for Python. He has also helped with the development of Anaconda. Flashlight enables you to easily solve for minimum snap trajectories that go through a sequence of waypoints, compute the required control forces along trajectories, execute the trajectories in a physics simulator, and visualize the simulation results. opensource project Church is a library to generate fake data. It's very useful when you need to bootstrap your database. Raspberry and Python projects/scripts. A simple,fast,extensible python library for data validation. Upcoming Conference / User Group Meet Projects tf-agent - 27 Stars, 1 Fork tensorflow reinforcement learning agents for OpenAI gym environments become - 5 Stars, 0 Fork Make one object become another. python-line-api - 4 Stars, 0 Fork SDK of the LINE Messaging API for Python. football-stats - 2 Stars, 0 Fork Football stats is a system which has the purpose of helping football match analyses. The final goal of the project is to have the capability of ball and players' position analysis, creating heatmaps and statistics of different actions or situations. pytocli - 2 Stars, 0 Fork A Python lib to generate CLI commands xfce4-system-monitor - 1 Stars, 0 Fork An xfce panel plugin to display the necessary information of the system. |
A. Jesse Jiryu Davis
Computer Science For All

MongoDB offered a paid fellowship to two teachers this summer. Jeremy Mellema and Tim Chen worked with the MongoDB Education Team in our office, developing a computer science curriculum based on Python, MongoDB, and other technologies. This fall, they're starting to teach the new class in NYC public high schools in Hell's Kitchen and the Bronx.
I followed them for a day, talking to them and their students and taking pictures. Read the story on the MongoDB Engineering Journal:
Investing In CS4All: Training Teachers and Helping Them Build Curricula
Images © A. Jesse Jiryu Davis
Abu Ashraf Masnun
Async Python: The Different Forms of Concurrency
With the advent of Python 3 the way we’re hearing a lot of buzz about “async” and “concurrency”, one
might simply assume that Python recently introduced these concepts/capabilities. But that would be quite
far from the truth. We have had async and concurrent operations for quite some times now. Also many
beginners may think that asyncio is the only/best way to do async/concurrent operations. In this post
we shall explore the different ways we can achieve concurrency and the benefits/drawbacks of them.
Defining The Terms
Before we dive into the technical aspects, it is essential to have some basic understanding of the terms frequently used in this context.
Sync vs Async
In Syncrhonous operations, the tasks are executed in sync, one after one. In asynchronous operations, tasks may start and complete independent of each other. One async task may start and continue running while the execution moves on to a new task. Async tasks don’t block (make the execution wait for it’s completion) operations and usually run in the background.
For example, you have to call a travel agency to book for your next vacation. And you need to send an email to your boss before you go on the tour. In synchronous fashion, you would first call the travel agency, if they put you on hold for a moment, you keep waiting and waiting. Once it’s done, you start writing the email to your boss. Here you complete one task after another. But if you be clever and while you are waiting on hold, you could start writing up the email, when they talk to you, you pause writing the email, talk to them and then resume the email writing. You could also ask a friend to make the call while you finish that email. This is asynchronicity. Tasks don’t block one another.
Concurrency and Parallelism
Concurrency implies that two tasks make progress together. In our previous example, when we considered the async example, we were making progress on both the call with the travel agent and writing the email. This is concurrency.
When we talked about taking help from a friend with the call, in that case both tasks would be running in parallel.
Parallelism is in fact a form of concurrency. But parallelism is hardware dependent. For example if there’s only one core in the CPU, two operations can’t really run in parallel. They just share time slices from the same core. This is concurrency but not parallelism. But when we have multiple cores, we can actually run two or more operations (depending on the number of cores) in parallel.
Quick Recap
So this is what we have realized so far:
- Sync: Blocking operations.
- Async: Non blocking operations.
- Concurrency: Making progress together.
- Parallelism: Making progress in parallel.
Threads & Processes
Python has had Threads for a very long time. Threads allow us to run our operations concurrently. But there was/is a problem with the Global Interpreter Lock (GIL) for which the threading could not provide true parallelism. However, with multiprocessing, it is now possible to leverage multiple cores with Python.
Threads
Let’s see a quick example. In the following code, the worker function will be run on multiple threads, asynchronously and
concurrently.
import threading
import time
import random
def worker(number):
sleep = random.randrange(1, 10)
time.sleep(sleep)
print("I am Worker {}, I slept for {} seconds".format(number, sleep))
for i in range(5):
t = threading.Thread(target=worker, args=(i,))
t.start()
print("All Threads are queued, let's see when they finish!")
Here’s a sample output from a run on my machine:
$ python thread_test.py
All Threads are queued, let's see when they finish!
I am Worker 1, I slept for 1 seconds
I am Worker 3, I slept for 4 seconds
I am Worker 4, I slept for 5 seconds
I am Worker 2, I slept for 7 seconds
I am Worker 0, I slept for 9 seconds
So you can see we start 5 threads, they make progress together and when we start the threads (and thus executing the worker function), the operation does not wait for the threads to complete before moving on to the next print statement. So this is an async operation.
In our example, we passed a function to the Thread constructor. But if we wanted we could also subclass it and implement the code
as a method (in a more OOP way).
Further Reading:
To know about Threads in details, you can follow these resources:
Global Interpreter Lock (GIL)
The Global Interpreter Lock aka GIL was introduced to make CPython’s memory handling easier and to allow better integrations with C (for example the extensions). The GIL is a locking mechanism that the Python interpreter runs only one thread at a time. That is only one thread can execute Python byte code at any given time. This GIL makes sure that multiple threads DO NOT run in parallel.
Quick facts about the GIL:
- One thread can run at a time.
- The Python Interpreter switches between threads to allow concurrency.
- The GIL is only applicable to CPython (the defacto implementation). Other implementations like Jython, IronPython don’t have GIL.
- GIL makes single threaded programs fast.
- For I/O bound operations, GIL usually doesn’t harm much.
- GIL makes it easy to integrate non thread safe C libraries, thansk to the GIL, we have many high performance extensions/modules written in C.
- For CPU bound tasks, the interpreter checks between
Nticks and switches threads. So one thread does not block others.
Many people see the GIL as a weakness. I see it as a blessing since it has made libraries like NumPy, SciPy possible which have
taken Python an unique position in the scientific communities.
Further Reading:
These resources can help dive deeper into the GIL:
Processes
To get parallelism, Python introduced the multiprocessing module which provides APIs which will feel very similar if you have used
Threading before.
In fact, we will just go and change our previous example. Here’s the modified version that uses Process instead of Thread.
import multiprocessing
import time
import random
def worker(number):
sleep = random.randrange(1, 10)
time.sleep(sleep)
print("I am Worker {}, I slept for {} seconds".format(number, sleep))
for i in range(5):
t = multiprocessing.Process(target=worker, args=(i,))
t.start()
print("All Processes are queued, let's see when they finish!")
So what’s changed? I just imported the multiprocessing module instead of threading. And then instead of Thread, I used
Process. That’s it, really! Now instead of multi threading, we are using multiple processes which are running on different core
of your CPU (assuming you have multiple cores).
With the Pool class, we can also distribute one function execution across multiple processes for different input values. If we
take the example from the official docs:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
Here, instead of iterating over the list of values and calling f on them one by one, we are actually running the function on
different processes. One process executes f(1), another runs f(2) and another runs f(3). Finally the results are again
aggregated in a list. This would allow us to break down heavy computations into smaller parts and run them in parallel for faster
calculation.
Further Reading:
The concurrent.futures module
The concurrent.futures module packs some really great stuff for writing async codes easily. My favorites are the ThreadPoolExecutor
and the ProcessPoolExecutor. These executors maintain a pool of threads or processes. We submit our tasks to the pool and it
runs the tasks in available thread/process. A Future object is returned which we can use to query and get the result when the task
has completed.
Here’s an example of ThreadPoolExecutor:
from concurrent.futures import ThreadPoolExecutor
from time import sleep
def return_after_5_secs(message):
sleep(5)
return message
pool = ThreadPoolExecutor(3)
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print(future.result())
I have a blog post on the concurrent.futures module here: http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html
which might be helpful for exploring the module deeper.
Further Reading:
Asyncio - Why, What and How?
You probably have the question many people in the Python community have - What does asyncio bring new to the table? Why did we need one more way to do async I/O? Did we not have threads and processes already? Let’s see!
Why do we need asyncio?
Processes are costly to spawn. So for I/O, Threads are chosen largely. We know that I/O depends on external stuff - slow disks or
nasty network lags make I/O often unpredictable. Now, let’s assume that we are using threads for I/O bound operations. 3 threads
are doing different I/O tasks. The interpreter would need to switch between the concurrent threads and give each of them some time
in turns. Let’s call the threads - T1, T2 and T3. The three threads have started their I/O operation. T3 completes it first.
T2 and T1 are still waiting for I/O. The Python interpreter switches to T1 but it’s still waiting. Fine, so it moves to T2,
it’s still waiting and then it moves to T3 which is ready and executes the code. Do you see the problem here?
T3 was ready but the interpreter switched between T2 and T1 first - that incurred switching costs which we could have avoided
if the interpreter first moved to T3, right?
What is asyncio?
Asyncio provides us an event loop along with other good stuff. The event loop tracks different I/O events and switches to tasks which are ready and pauses the ones which are waiting on I/O. Thus we don’t waste time on tasks which are not ready to run right now.
The idea is very simple. There’s an event loop. And we have functions that run async, I/O operations. We give our functions to the
event loop and ask it to run those for us. The event loop gives us back a Future object, it’s like a promise that we will get
something back in the future. We hold on to the promise, time to time check if it has a value (when we feel impatient) and finally
when the future has a value, we use it in some other operations.
Asyncio uses generators and coroutines to pause and resume tasks. You can read these posts for more details:
- http://masnun.com/2015/11/20/python-asyncio-future-task-and-the-event-loop.html
- http://masnun.com/2015/11/13/python-generators-coroutines-native-coroutines-and-async-await.html
How do we use asyncio?
Before we beging, let’s see example codes:
import asyncio
import datetime
import random
async def my_sleep_func():
await asyncio.sleep(random.randint(0, 5))
async def display_date(num, loop):
end_time = loop.time() + 50.0
while True:
print("Loop: {} Time: {}".format(num, datetime.datetime.now()))
if (loop.time() + 1.0) >= end_time:
break
await my_sleep_func()
loop = asyncio.get_event_loop()
asyncio.ensure_future(display_date(1, loop))
asyncio.ensure_future(display_date(2, loop))
loop.run_forever()
Please note that the async/await syntax is Python 3.5+ only. if we walk through the codes:
- We have an async function
display_datewhich takes a number (as an identifier) and the event loop as parameters. - The function has an infinite loop that breaks after 50 secs. But during this 50 sec period, it repeatedly prints out the time
and takes a nap. The
awaitfunction can wait on other async functions (coroutines) to complete. - We pass the function to event loop (using the
ensure_futuremethod). - We start running the event loop.
Whenever the await call is made, asyncio understands that the function is probably going to need some time. So it pauses the execution,
starts monitoring any I/O event related to it and allows tasks to run. When asyncio notices that paused function’s I/O is ready, it
resumes the function.
Making the Right Choice
We have walked through the most popular forms of concurrency. But the question remains - when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:
if io_bound:
if io_very_slow:
print("Use Asyncio")
else:
print("Use Threads")
else:
print("Multi Processing")
- CPU Bound => Multi Processing
- I/O Bound, Fast I/O => Multi Threading
- I/O Bound, Slow I/O => Asyncio
October 05, 2016
François Dion
Something For Your Mind, Polymath Podcast episode 2
A is for Anomaly
"so perhaps this is not the ideal way of keeping track of 15 individuals..."
Vasudev Ram
Get names and types of a Python module's attributes
By Vasudev Ram
Hi readers,
Today I thought of this simple Python utility while using introspection to look at some modules.
It looks at a module, and for each attribute in it, it tells you the name and type of the attribute. This is useful if you are exploring some new Python module (built-in or third-party), and you want, for example, to know all the functions or methods in it, so that you can further introspect those by printing their docstrings, using the form:
print(module_name.function_or_method_name.__doc__)because the docstring of a Python function of method, if present, is a nice capsule summary of: its arguments, what it does, and its return value (i.e. its input, processing and output). So with such a docstring, in many cases, a reasonably experienced programmer may not even need to look up the actual Python docs for that function or method, before beginning to use it, thereby saving their time.
So here is the utility:
from __future__ import print_functionRunning it like this:
# mod_attrs_and_types.py
# Purpose: To show the attribute names and types
# of a Python module, to help with learning about it.
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: http://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram
import sys
def attrs_and_types(mod_name):
print('Attributes and their types for module {}:'.format(mod_name))
print()
for num, attr in enumerate(dir(eval(mod_name))):
print("{idx}: {nam:30} {typ}".format(
idx=str(num + 1).rjust(4),
nam=(mod_name + '.' + attr).ljust(30),
typ=type(eval(mod_name + '.' + attr))))
attrs_and_types(sys.__name__)
$ python mod_attrs_and_types.py > outgave this output:
Attributes and their types for module sys:There are other ways to do this, such as using the inspect module, but this is an easy way without inspect.
1: sys.__displayhook__ <type 'builtin_function_or_method'>
2: sys.__doc__ <type 'str'>
3: sys.__egginsert <type 'int'>
4: sys.__excepthook__ <type 'builtin_function_or_method'>
5: sys.__name__ <type 'str'>
6: sys.__package__ <type 'NoneType'>
7: sys.__plen <type 'int'>
8: sys.__stderr__ <type 'file'>
9: sys.__stdin__ <type 'file'>
10: sys.__stdout__ <type 'file'>
11: sys._clear_type_cache <type 'builtin_function_or_method'>
12: sys._current_frames <type 'builtin_function_or_method'>
13: sys._getframe <type 'builtin_function_or_method'>
14: sys._mercurial <type 'tuple'>
15: sys.api_version <type 'int'>
16: sys.argv <type 'list'>
17: sys.builtin_module_names <type 'tuple'>
18: sys.byteorder <type 'str'>
19: sys.call_tracing <type 'builtin_function_or_method'>
20: sys.callstats <type 'builtin_function_or_method'>
21: sys.copyright <type 'str'>
22: sys.displayhook <type 'builtin_function_or_method'>
23: sys.dllhandle <type 'int'>
24: sys.dont_write_bytecode <type 'bool'>
25: sys.exc_clear <type 'builtin_function_or_method'>
26: sys.exc_info <type 'builtin_function_or_method'>
27: sys.exc_type <type 'NoneType'>
28: sys.excepthook <type 'builtin_function_or_method'>
29: sys.exec_prefix <type 'str'>
30: sys.executable <type 'str'>
31: sys.exit <type 'builtin_function_or_method'>
32: sys.flags <type 'sys.flags'>
33: sys.float_info <type 'sys.float_info'>
34: sys.float_repr_style <type 'str'>
35: sys.getcheckinterval <type 'builtin_function_or_method'>
36: sys.getdefaultencoding <type 'builtin_function_or_method'>
37: sys.getfilesystemencoding <type 'builtin_function_or_method'>
38: sys.getprofile <type 'builtin_function_or_method'>
39: sys.getrecursionlimit <type 'builtin_function_or_method'>
40: sys.getrefcount <type 'builtin_function_or_method'>
41: sys.getsizeof <type 'builtin_function_or_method'>
42: sys.gettrace <type 'builtin_function_or_method'>
43: sys.getwindowsversion <type 'builtin_function_or_method'>
44: sys.hexversion <type 'int'>
45: sys.long_info <type 'sys.long_info'>
46: sys.maxint <type 'int'>
47: sys.maxsize <type 'int'>
48: sys.maxunicode <type 'int'>
49: sys.meta_path <type 'list'>
50: sys.modules <type 'dict'>
51: sys.path <type 'list'>
52: sys.path_hooks <type 'list'>
53: sys.path_importer_cache <type 'dict'>
54: sys.platform <type 'str'>
55: sys.prefix <type 'str'>
56: sys.py3kwarning <type 'bool'>
57: sys.setcheckinterval <type 'builtin_function_or_method'>
58: sys.setprofile <type 'builtin_function_or_method'>
59: sys.setrecursionlimit <type 'builtin_function_or_method'>
60: sys.settrace <type 'builtin_function_or_method'>
61: sys.stderr <type 'file'>
62: sys.stdin <type 'file'>
63: sys.stdout <type 'file'>
64: sys.subversion <type 'tuple'>
65: sys.version <type 'str'>
66: sys.version_info <type 'sys.version_info'>
67: sys.warnoptions <type 'list'>
68: sys.winver <type 'str'>
You can (e)grep for the pattern 'function|method' in the output, to get only the lines you want:
(If you haven't earlier, also check min_fgrep: minimal fgrep command in D.)
$ grep -E "function|method" out
1: sys.__displayhook__ <type 'builtin_function_or_method'>
4: sys.__excepthook__ <type 'builtin_function_or_method'>
11: sys._clear_type_cache <type 'builtin_function_or_method'>
12: sys._current_frames <type 'builtin_function_or_method'>
13: sys._getframe <type 'builtin_function_or_method'>
19: sys.call_tracing <type 'builtin_function_or_method'>
20: sys.callstats <type 'builtin_function_or_method'>
22: sys.displayhook <type 'builtin_function_or_method'>
25: sys.exc_clear <type 'builtin_function_or_method'>
26: sys.exc_info <type 'builtin_function_or_method'>
28: sys.excepthook <type 'builtin_function_or_method'>
31: sys.exit <type 'builtin_function_or_method'>
35: sys.getcheckinterval <type 'builtin_function_or_method'>
36: sys.getdefaultencoding <type 'builtin_function_or_method'>
37: sys.getfilesystemencoding <type 'builtin_function_or_method'>
38: sys.getprofile <type 'builtin_function_or_method'>
39: sys.getrecursionlimit <type 'builtin_function_or_method'>
40: sys.getrefcount <type 'builtin_function_or_method'>
41: sys.getsizeof <type 'builtin_function_or_method'>
42: sys.gettrace <type 'builtin_function_or_method'>
43: sys.getwindowsversion <type 'builtin_function_or_method'>
57: sys.setcheckinterval <type 'builtin_function_or_method'>
58: sys.setprofile <type 'builtin_function_or_method'>
59: sys.setrecursionlimit <type 'builtin_function_or_method'>
60: sys.settrace <type 'builtin_function_or_method'>
You can also (e)grep for a pattern or for alternative patterns:
$ grep -E "std(in|out)" out
9: sys.__stdin__ <type 'file'>
10: sys.__stdout__ <type 'file'>
62: sys.stdin <type 'file'>
63: sys.stdout <type 'file'>
The image at the top of the post is of a replica of a burning glass owned by Joseph Priestley, in his laboratory. If you don't remember your school physics, he is credited with having discovered oxygen.
- Enjoy.
- Vasudev Ram - Online Python training and consulting Get updates on my software products / ebooks / courses. Jump to posts: Python DLang xtopdf Subscribe to my blog by email My ActiveState recipes Managed WordPress Hosting by FlyWheel
October 04, 2016
Doing Math with Python
Trying out the code on Ubuntu 16.04
If you are using Ubuntu 16.04 and don't want to install the Anaconda Python distribution for trying out the book's programs or the sample solutions, this post is for you.
Ubuntu 16.04 already comes with Python 3 installed, so we only need to install the following packages - matplotlib, matplotlib-venn, sympy and idle3.
Open a terminal and do the following:
$ sudo apt-get update $ sudo apt-get install python3-matplotlib python3-matplotlib-venn python3-sympy idle3
It's worth noting that this will install sympy 0.7.6 and matplotlib 1.5.1 which are both sufficient for the book's programs.
Starting IDLE editor
You can now start the IDLE editor by typing in "idle3" from the terminal and then it's ready for your programs!
Contact
If you find any issues please email me at doingmathwithpython@gmail.com or post your query/tip to any of the following community forums:
Weekly Python Chat
Truthiness
Let's talk about truthiness in Python! Is the opposite of truthiness... falseyness? What is it, why does it matter, and how can you use it?
Łukasz Langa
RE: Diversity on the Python sprint in September
As an organizer, I feel compelled to respond to the accusations about this event not being diverse enough.
Martijn Faassen
Morepath 0.16 released!
I'm proud to announce the release of Morepath 0.16. Morepath_ is a Python web framework that is easy to use and lightweight but grows with you when your project demands more.
Morepath 0.16 is one of the biggest releases of Morepath in a while. I want to discuss a few of the highlights of this release here.
Reg Rewritten
Morepath uses the predicate dispatch library Reg for its view lookup system and other behavior. We've rewritten Reg once again. For most Morepath users nothing changes, except that Reg is faster which also makes Morepath faster. If you want to use Reg directly, the new registration API makes it easier to use.
With Reg you can control the context in which dispatch takes place: this allows multiple separate configurations of dispatch in the same runtime. To control context, previously we used an implicit global lookup object, or an explicit but not very Pythonic lookup argument. Those are all gone. If you need multiple dispatch contexts in an application, you can define dispatch methods which derive their context from their class. This change allowed us to simplify Reg considerably and increase its performance.
This work was done by Stefano Taschini in collaboration with myself. Thanks Stefano!
New pip-based build system
This only affects us Morepath developers, but it's a significant change, so I want to highlight it here. We have a nice core team of contributors now and I hope we can attract more, after all.
I've been a happy buildout user over the years, so of course I used it for Morepath's development setup as well. But for a Python-only project like Morepath, pip can now do what buildout does. Since many more Python programmers are familiar with pip, and we want to make it as easy as possible for someone to start contributing, we've taken the plunge and entirely replaced buildout with pip. Even a buildout guy such as myself has been appreciating the results.
We've updated our developer documentation to reflect the changes, so it's easy to find how to do common things. The build environment for the Reg and Dectate libraries were used to use pip as well.
This work was done by Henri Hulski. Thanks Henri!
Other significant changes
- I took a good look at Traject's routing system with an eye on performance and refactored it.
- We realized that the directive directive was a bit too magic for its own good. I changed Dectate so that new Morepath configuration directives are now defined directly on the App class using the dectate.directive function. This breaks some code if you define new directives, but it's easy to fix.
- Our extensive documentation has had a reorganization of its table of contents.
Look at the detailed changelog for much more information, including upgrade notes.
Performance increase
I benchmarked Morepath quite frequently during this development cycle. To make benchmarking easier, I created a new benchmarking tool called howareyou. It can not only benchmark Morepath, but can also benchmark other web frameworks -- Michael Merickel has in fact been using it already to help optimize Pyramid. You can find the howareyou tool here . The origins of this tool ultimately go back to work by wheezy.web creator Andriy Kornatskyy.
Morepath uses Webob for its request and response implementation. I learned quite a lot about Webob performance characteristics during this development cycle. This allowed me to make performance tweaks in Morepath.
It also let me detect that Webob's development version had some performance regressions that affected both Pyramid and Morepath. I'm very grateful to Bert Regeer for picking up so quickly and thoroughly on my reports of performance problems in Webob, and the Webob development version is currently actually slightly faster than release 1.6.1.
I talked about Morepath's performance history recently in my article Is Morepath Fast yet?. There we had peaked at about 19000 requests per second (on a synthetic benchmark) for the development version. I am happy to announce that we've managed to increase performance even more in our 0.16 release. It's now more than 28000 requests per second!
Let's compare Morepath with some other carefully selected frameworks:
Cool, Morepath 0.16 is actually faster than Pyramid at this point in time! I don't expect it to last long given that the Pyramid devs already using howareyou to optimize Pyramid, but it's nice to have such a moment. And I deliberately didn't include Falcon, Bottle or wheezy.web in this comparison, as that would rather spoil the effect. Do remember these are somewhat silly, synthetic benchmarks. It's rare indeed that Python web framework overhead is going to affect real world performance, but at least Morepath isn't the slowest one, right?
Enjoy!
I hope you all enjoy the fresh new release. Do get in touch with us!
Mike Driscoll
Python 101/201 Educational Giveaway
I think it’s very important for teens and college students to learn how to program. Science, Technology, Engineering and Mathematics such a crucial set of topics to learn about that I have decided to give away copies of Python 101 and Python 201: Intermediate Python to teachers and professors starting today until 11:59 p.m. CST of October 14th, 2016.
Students with valid educational email addresses can also enter to get a free copy of the eBooks, but they will not be eligible for the paperbacks.
How to Get a Copy
Just leave a comment or contact me via my contact form and tell me why you want a copy. I do require some kind of proof that you’re an educator. If you can leave a comment or send me an email via the contact form using an official email address (such as an *.edu domain) or link me to some other proof (LinkedIn, your profile on a school website, etc), that would be great.
Prizes
- Everyone who enters with a valid educational email or other type of proof will receive am eBook copy of Python 101 and Python 201.
- 5 lucky winners will get a copy of the paperback version of Python 201: Intermediate Python + the above
- The Grand Prize will be a paperback copy of Python 101 and Python 201: Intermediate Python + the eBook copies
Deadline
Get your comment or send in an email via the contact form before October 14th, 2016 at 11:59 p.m. CST and you will be entered. I’ll go through the entries and contact the winners.
Gocept Weblog
Zope Resurrection Part 2 – Defibrillation
After reanimation we started defibrillation of Zope and … it kinda worked:
On our sprint we got the following things done to help Zope in the Python 3 wonderland:
- Release zope.testbrowser 5.0 which is compatible with Python 3 and no longer uses mechanize.
- PullRequest for ExtensionClass with the Python 3 port of the C extension.
- five.globalrequest is now compatible with Python 3.
- PullRequest for zope.globalrequest to make it compatible with Python 3.
- Clean-up of the documentation of the ZopeToolKit (ZTK).
As grok builds on the ZTK, it is a beneficiary of the reanimation. The following steps have been undertaken to lead it to the Python 3 wonderland:
- At 16 packages work has started to reach Python 3 compatibility.
- The general testsuite of grok in groktoolkit is already running under Python 3.
- The testsuites of grokcore.annotation, grokcore.catalog, grokcore.chameleon, gorkcore.component, grokcore.content, grokcore.view, grokcore.traverser in groktoolkit show a green bar under Python 3.
- An additional optionflag for doctests has been introduced in zope.testing to help with the rewrite of assertions of Exceptions in tracebacks, which is also valuable for porting other projects with a strong focus on doctests.
We have had a discussion about the broader future of Zope:
- There could be more optional dependencies like ZServer in Zope 4.
- The ZMI could be removed altogether because it was not maintained any more for years. It should not be used by applications built on top of Zope anyway. Plone even suggests to block public access to the ZMI.
- There is a road map needed for Zope 4 so the Plone community can pick it up as decided in the Zope 4 PLIP.
Conclusion: Zope is not dead. On the sprint there were nearly 20 people who use Zope for their daily work. Some of them even joined the sprint without the backup of a company in their spare time. Yes, it will need time and effort to keep Zope alive and make it prosper in the Python 3 wonderland, but Zope is still needed and has its place in the audience of web frameworks.
We at gocept will keep Zope as part of our supported technology stack in the projects it fits the purpose and will offer help to others who need to migrate a long term project into the future. We will be at the PyConDE in Munich at the end of October and will be open for questions and further discussions. Do not hesitate to talk to us.
Daniel Bader
The Complete Guide to Setting up Sublime Text for Python Developers – Now Available!
The Complete Guide to Setting up Sublime Text for Python Developers – Now Available!
Hey folks, I’m super excited to announce the launch of my first book – It’s called “The Complete Guide to Setting up Sublime Text for Python Developers”.
It’s a detailed, step-by-step guidebook aimed at getting you to a kickass, professional-grade Python development setup built around Sublime Text in the shortest amount of time possible.
I created this because I’ve been using Sublime Text for almost four years now in my Python workflow and I think it’s an amazing combo.
However I kept getting so many emails and questions about this development setup when I used it in my screencasts.
That made me realize how difficult it can be to set up an enjoyable Python development environment – and I decided to do something about it by writing the ULTIMATE setup guide for Sublime Text + Python 😃.
If you want to become a better and more productive developer then this guide is really going to help you get more out of your Python workflow.
Check out SublimeTextPython.com to see what it’s all about! Thanks so much for your support! Enjoy the guide and let me know what you think!
October 03, 2016
Dataquest
Working with SQLite Databases using Python and Pandas
SQLite is a database engine that makes it simple to store and work with relational data. Much like the csv format, SQLite stores data in a single file that can be easily shared with others. Most programming languages and environments have good support for working with SQLite databases. Python is no exception, and a library to access SQLite databases, called sqlite3, has been included with Python since version 2.5. In this post, we’ll walk through how to use sqlite3 to create, query, and update databases. We’ll also cover how to simplify working with SQLite databases using the pandas package. We’ll be using Python 3.5, but this same approach should work with Python 2.
Before we get started, let’s take a quick look at the data we’ll be working with. We’ll be looking at airline flight data, which contains information on airlines, airports, and routes between airports. Each route represents a repeated flight that an airline flies between a source and a destination airport.
All of the data is in a SQLite database called flights.db, which contains three tables – airports, airlines, and routes. You can download the data here.
Here are two rows from the airlines table:
Kay Hayen
Nuitka Release 0.5.23
This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.
This release is focusing on optimization, the most significant part for the users being enhanced scalability due to memory usage, but also break through structural improvements for static analysis of iterators and the debut of type shapes and value shapes, giving way to "shape tracing".
Bug Fixes
Fix support Python 3.5.2 coroutine changes. The checks got added for improved mode for older 3.5.x, the new protocol is only supported when run with that version or higher.
Fix, was falsely optimizing away unused iterations for non-iterable compile time constants.
iter(1) # needs to raise.
Python3: Fix, eval must not attempt to strip memoryviews. The was preventing it from working with that type.
Fix, calling type without any arguments was crashing the compiler. Also the exception raised for anything but 1 or 3 arguments was claiming that only 3 arguments were allowed, which is not the compatible thing.
Python3.5: Fix, follow enhanced error checking for complex call handling of star arguments.
Compatibility: The from x import x, y re-formulation was doing two __import__ calls instead of re-using the module value.
Optimization
Uses only about 66% of the memory compared to last release, which is very important step for scalability independent of re-loading. This was achieved by making sure to break loop traces and their reference cycle when they become unused.
Properly detect the len of multiplications at compile time from newly introduces value shapes, so that this is e.g. statically optimized.
print(len("*" * 10000000000))
Due to newly introduced type shapes, len and iter now properly detect more often if values will raise or not, and warn about detected raises.
iter(len((something)) # Will always raise
Due to newly introduced "iterator tracing", we can now properly detect if the length of an unpacking matches its source or not. This allows to remove the check of the generic re-formulations of unpackings at compile time.
a, b = b, a # Will never raise due to unpacking a, b = b, a, c # Will always raise, 3 items cannot unpack to 2
Added support for optimization of the xrange built-in for Python2.
Python2: Added support for xrange iterable constant values, pre-building those constants ahead of time.
Python3: Added support and range iterable constant values, pre-building those constants ahead of time. This brings optimization support for Python3 ranges to what was available for Python2 already.
Avoid having a special node variange for range with no arguments, but create the exception raising node directly.
Specialized constant value nodes are using less generic implementations to query e.g. their length or iteration capabilities, which should speed up many checks on them.
Added support for the format built-in.
Python3: Added support for the ascii built-in.
Organizational
- The movement to pure C got the final big push. All C++ only idoms of C++ were removed, and everything works with C11 compilers. A C++03 compiler can be used as a fallback, in case of MSVC or too old gcc for instance.
- Using pure C, MinGW64 6x is now working properly. The latest version had problems with hypot related changes in the C++ standard library. Using C11 solves that.
- This release also prepares Python 3.6 support, it includes full language support on the level of CPython 3.6.0b1.
- The CPython 3.6 test suite was run with Python 3.5 to ensure bug level compatibility, and had a few findings of incompatibilities.
Cleanups
- The last holdouts of classes in Nuitka were removed, and many idioms of C++ were stopped using.
- Moved range related helper functions to a dedicated include file.
- Using str is not bytes to detect Python3 str handling or actual bytes type existence.
- Trace collections were using a mix-in that was merged with the base class that every user of it was having.
Tests
- Added more static optimization tests, a lot more has become feasible to decide at run time, and is now done. These are to detect regressions in that domain.
- The CPython 3.6 test suite is now also run with CPython 3.5 which found some incompatibilities.
Summary
This release marks a huge step forward. We are having the structure for type inference now. This will expand in coming releases to cover more cases, and there are many low hanging fruits for optimization. Specialized codes for variable versions of certain known shapes seems feasible now.
Then there is also the move towards pure C. This will make the backend compilation lighter, but due to using C11, we will not suffer any loss of convinience compared to "C-ish". The plan is to use continue to use C++ for compilation for compilers not capable of supporting C11.
The amount of static analysis done in Nuitka is now going to quickly expand, with more and more constructs predicted to raise errors or simplified. This will be an ongoing activity, as many types of expressions need to be enhanced, and only one missing will not let it optimize as well.
Also, it seems about time to add dedicated code for specific types to be as fast as C code. This opens up vast possibilities for acceleration and will lead us to zero overhead C bindings eventually. But initially the drive is towards enhanced import analysis, to become able to know the precide module expected to be imported, and derive type information from this.
The coming work will attack to start whole program optimization, as well as enhanced local value shape analysis, as well specialized type code generation, which will make Nuitka improve speed.
Catalin George Festila
The python CacheControl module - part 002.
Today was a hard day and this is the reason I make this short tutorial.
Teory of HTTP:
HTTP specifies four response cache headers that you can set to enable caching:
Cache-ControlExpiresETagLast-Modified
- Expiration Caching - used to cache your entire response for a specific amount of time (e.g. 24 hours), simple, but cache invalidation is more difficult;
- Validation Caching - this is more complex and used to cache your response, but allows you to dynamically invalidate it as soon as your content changes.
Come with a simple class named DictCache. You can named with any name and is a BaseCache class.
The next step I make is to show you how can access it.
One simpe way is to see the page - first session.
The complex come when you need to access for example data and info like:
'adapters', 'auth', 'cert', 'close', 'cookies', 'delete', 'get', 'get_adapter', 'head', 'headers', 'hooks', 'max_redirects', 'merge_environment_settings', 'mount', 'options', 'params', 'patch', 'post', 'prepare_request', 'proxies', 'put', 'rebuild_auth', 'rebuild_method', 'rebuild_proxies', 'redirect_cache', 'request', 'resolve_redirects', 'send', 'stream', 'trust_env', 'verify'
And this is come with teh second session from this source code:
import requests
from cachecontrol import CacheControl
from cachecontrol.cache import BaseCache
class DictCache(BaseCache):
def __init__(self, init_dict=None):
self.data = init_dict or {}
def get(self, key):
return self.data.get(key, None)
def set(self, key, value):
self.data.update({key: value})
def delete(self, key):
self.data.pop(key)
print "first session requests"
sess = requests.session()
cached_sess = CacheControl(sess)
response = cached_sess.get('http://google.com')
print '=================='
print 'see page by add this: print response.text'
print '=================='
print "second session BaseCache"
sess2 = requests.session()
base=DictCache(sess2)
print '=================='
print "dir(base)"
print dir(base)
print '=================='
print"dir(base.data)"
print dir(base.data)
print '=================='
print"base.data.max_redirects"
print base.data.max_redirects
print '=================='
Tarek Ziade
Web Services Best Practices
The other day I've stumbled on a reddit comment on Twitter about micro-services. It really nailed down the best practices around building web services, and I wanted to use it as a basis to write down a blog post. So all the credits go to rdsubhas for this post :)
Web Services in 2016
The notion of micro-service rose in the past 5 years, to describe the fact that our applications are getting splitted into smaller pieces that need to interact to provide the same service that what we use to do with monolothic apps.
Splitting an app in smaller micro services is not always the best design decision in particular when you own all the pieces. Adding more interactions to serve a request just makes things more complex and when something goes wrong you're just dealing with a more complex system.
Peope often think that it's easier to scale an app built with smaller blocks, but it's often not the case, and sometimes you just end up with a slower, over-engineered solution.
So why are we building micro-services ?
What really happened I think is that most people moved their apps to cloud providers and started to use the provider services, like centralized loggers, distributed databases and all the fancy services that you can use in Amazon, Rackspace or other places.
In the LAMP architecture, we're now building just one piece of the P and configuring up to 20 services that interact with it.
A good chunk of our daily jobs now is to figure out how to deploy apps, and even if some tools like Kubertenes gives us the promise of an abstraction on the top of cloud providers, the reality is that you have to learn how AWS or another provider works to built something that works well.
Understanding how multi-zone replication works in RDS is mandatory to make sure you control your application behavior.
Because no matter how fancy and reliable, all those services are, the quality of your application will be tighted to its ability to deal with problems like network splits or timeouts etc.
That's where the shift in bests practices is: when something goes wrong, it's harder just to tail your postgres logs and your Python app and see what's going on. You have to deal with many parts.
Best Practices
I can't find the original post on Reddit, so I am just going to copy it here and curate it with my own opinions and with the tools we use at Mozilla. I've also removed what I see as redundant tips.
Basic monitoring, instrumentation, health check
We use statsd everywhere and services like Datadog to see what's going on in our services.
We also have two standard heartbeat endpoints that are used to monitor the services. One is a simple round trip where the service just sends back a 200, and one is more of a smoke test, where the service tries to use all of its own backends to make sure it can reach them and read/write into them.
We're doing this distinction because the simple round trip health check is being hit very often, and the one that calls all the services the service use, less often to avoid doing too much traffic and load.
Distributed logging, tracing
Most of our apps are in Python, and we use Sentry to collect tracebacks and sometimes New Relic to detect problems we could not reproduce in a dev environment.
Isolation of the whole build+test+package+promote for every service.
We use Travis-CI to trigger most of our builds, tests and packages. Having reproducible steps made in an isolated environment like a CI gives us good confidence on the fact that the service is not spaghetti-ed with other services.
The bottom line is that "gill pull & make test" should work in Travis no matter what, without calling an external service. The travis YML file, the Makefile and all the mocks in the tests are rhoughly our 3 gates to the outside world. That's as far as we go in term of build standards.
Maintain backward compatibility as much as possible
The initial tip included forward compatibility. I've removed it, because I don't think it's really a thing when you build web services. Forward compatibility means that an older version of your service can accept requests from newer version of the client side. But I think it should just be a deployment issue and an error management on the client side, so you don't bend your data design just so it works with older service versions.
For backward compatibility though, I think it's mandatory to make sure that you know how to interact with older clients, whatever happens. Depending on your protocol, older clients could get an update triggered, partially work, or just work fine -- but you have to get this story right even before the first version of your service is published.
But if your design has dramatically changed, maybe you need to accept the fact that your are building something different, and just treat it as a new service (with all the pain that brings if you need to migrate data.)
Firefox Sync was one complex service to migrate from its first version to its latest version because we got a new authentication service along the way.
Ready to do more TDD
I just want to comment on this tip. Doing more TDD imply that it's cool to do less TDD when you build software that's not a service.
I think this is a bad advice. You should simply do TDD right. Not less or more, but right.
Doing TDD right in my opinion is :
- 100% coverage unless your have something very specific you can't mock.
- Avoid over-mocking at all costs because testing mocks is often slightly different from testing the real stuff.
- Make sure your tests pass all the time, and are fast to pass, otherwise people will just start to skip them.
- Functional tests are generally superior to unit tests for testing services. I often drop unit tests in some services projects because everything is covered by my functional tests. Remember: you are not building a library.
Have engineering methodologies and process-tools to split down features and develop/track/release them across multiple services (xp, pivotal, scrum)
That's a good tip. Trying to reproduce what has worked when building a service, to build the next one is a great idea.
However, this will only work if the services are built by the same team, because the whole engineering methodology is adopted and adapted by people. You don't stick into people's face the SCRUM methodology and make the assumption that everyone will work as described in the book. This never happens. What usually happens is that every member of the team brings their own recipes on how things should be done, which tracker to use, what part of XP makes sense to them, and the team creates its own custom methodology out of this. And it takes time.
Start a service with a new team, and that whole phase starts again.



