Category Name

Because it’s Friday: 3-D Animation

David Smith — Fri, 08 Dec 2017 22:50:26 +0000

We've had 3-D animation for quite a while now, of course, but what happens when a traditional 2-D animator uses a virtual reality system to draw? When famed Disney animator Glen Keane sketches his most iconic creation — Ariel from The Little Mermaid — using Tilt Brush, the result is surprisingly moving.

That's all from us for this week. We'll be back with more on Monday, and in the meantime have a great weekend!

In case you missed it: November 2017 roundup

David Smith — Thu, 07 Dec 2017 21:54:02 +0000

In case you missed them, here are some articles from November of particular interest to R users.

R 3.4.3 "Kite Eating Tree" has been released.

Several approaches for generating a "Secret Santa" list with R.

The "RevoScaleR" package from Microsoft R Server has now been ported to Python.

The call for papers for the R/Finance 2018 conference in Chicago is now open.

Give thanks to the volunteers behind R.

Advice for R user groups from the organizer of R-Ladies Chicago.

Use containers to build R clusters for parallel workloads in Azure with the doAzureParallel package.

A collection of R scripts for interesting visualizations that fit into a 280-character Tweet.

R is featured in a StackOverflow case study at the Microsoft Connect conference.

The City of Chicago uses R to forecast water quality and issue beach safety alerts.

A collection of best practices for sharing data in spreadsheets, from a paper by Karl Broman and Kara Woo.

The MRAN website has been updated with faster package search and other improvements.

The curl package has been updated to use the built-in winSSL library on Windows.

Beginner, intermediate and advanced on-line learning plans for developing AI applications on Azure.

A recap of the EARL conference (Effective Applications of the R Language) in Boston.

Giora Simchoni uses R to calculate the expected payout from a slot machine.

An introductory R tutorial by Jesse Sadler focuses on the analysis of historical documents.

A new RStudio cheat sheet: "Working with Strings".

An overview of generating distributions in R via simulated gaming dice.

An analysis of StackOverflow survey data ranks R and Python among the most-liked and least-disliked languages.

And some general interest stories (not necessarily related to R):

Siri transcribes a trombone player
A collection of short videos of interesting chemical reactions
An animation shows the impact of a rogue drone on Gatwick airport
An AI sythesizes novel images of furniture, animals, and celebritiesA

As always, thanks for the comments and please send any suggestions to me at davidsmi@microsoft.com. Don't forget you can follow the blog using an RSS reader, via email using blogtrottr, or by following me on Twitter (I'm @revodavid). You can find roundups of previous months here.

What’s new for Python in Visual Studio 2017 15.6 Preview 1

Steve Dower [MSFT] — Thu, 07 Dec 2017 20:00:32 +0000

Today we have released the first preview of our next update to Visual Studio 2017. You will see a notification in Visual Studio within the next few days, or you can download the new installer from visualstudio.com.

In this post, we're going to take a look at some of the new features we have added for Python developers. As always, the preview is a way for us to get features into your hands early so you can provide feedback and we can identify issues with a smaller (and hopefully more forgiving!) audience. If you encounter any trouble, please use the Report a Problem tool to let us know.

Immediate IntelliSense updates with no database

Remember how every time you installed or updated a package we would make you wait for hours while we "refresh" our "completion DB"? No more! In this update we are fundamentally changing how we handle this for installed Python environments, including virtual environments, so that we can provide IntelliSense immediately without the refresh.

This has been available as an experimental feature for a couple of releases, and we think it's ready to turn on by default. When you open the Python Environments window, you'll see the "IntelliSense" view is disabled and there is no longer a way to refresh the database -- because there is no database!

The new system works by doing lightweight analysis of Python modules as you import them in your code. This includes .pyd files, and if you have .pyi files alongside your original sources then we will prefer those (see PEP 484 for details of .pyi files. In essence, these are Python "include" files for editors to obtain information about Python modules, but do not actually have any code in them - just function stubs with type annotations).

You should notice some improvements in IntelliSense for packages like pandas and scikit-learn, though there will likely be some packages that do not work as well as before. We are actively working on improving results for various code constructs, and you will also see better IntelliSense results as packages start including .pyi type hint files. We encourage you to post on this github issue to let us know about libraries that still do not work well.

(NOTE: If you install this preview alongside an earlier version of Visual Studio 2017, the preview of this feature will also be enabled in earlier version. You can go back to the old model by disabling the feature in Preview. To do this, open Tools, Options, find the Python/Experimental page, deselect "Use new style IntelliSense" and restart both versions of Visual Studio.)

conda integration

If you use Anaconda, you likely already manage your environments and packages using the conda tool. This tool installs pre-built packages from the Anaconda repository (warning: long page) and manages compatibility with your environment and the other packages you have installed.

For this preview of Visual Studio, we have added two experimental options to help you work with Anaconda:

Automatically detect when conda is a better option for managing packages
Automatically detect any Anaconda environments you have created manually

To enable either or both of these features, open Tools, Options, find the Python/Experimental page, and select the check box. For this preview we are starting with both disabled to avoid causing unexpected trouble, but we intend to turn them on by default in a future release.

With "Automatically detect Conda environments" enabled, any environments created by the conda tool will be detected and listed in the Python Environments window automatically. You can open interactive windows for these environments, assign them in projects or make them your default environment.

With the "Use Conda package manager when available" option enabled, any environments that have conda installed will use that for search, install and updating instead of pip. Very little will visibly change, but we hope you'll be more successful when adding or removing packages to your environment.

Notice that these two options work independently: you can continue to use pip to manage packages if you like, even if you choose to detect environments that were created with conda. If you are an Anaconda user, you will likely want to enable both options. However, if you do this and encounter issues, disabling each one in turn and then reporting any differences will help us quickly narrow down the source.

Other improvements

We have made a range of other minor improvements and bug fixes throughout all of our Python language support and there are more to come.

Our "IPython interactive mode" is now using the latest APIs, with improved IntelliSense and the same module and class highlighting you see in the editor.

There are new code snippets for the argparse module. Start typing "arg" in the editor to see what is available.

We've also added new color customization options for docstrings and regular expression literals (under Tools, Options, Fonts and Colors). Doc strings have a new default color.

If you encounter any issues, please use the Report a Problem tool to let us know (this can be found under Help, Send Feedback) or continue to use our github page. Follow our blog to make sure you hear about our updates first, and thanks for using Visual Studio!

The British Ecological Society’s Guide to Reproducible Science

David Smith — Wed, 06 Dec 2017 22:26:26 +0000

The British Ecological Society has published a new volume in their Guides to Better Science series: A Guide to Reproducible Code in Ecology and Evolution (pdf). The introduction, by , describes its scope:

A Guide to Reproducible Code covers all the basic tools and information you will need to start making your code more reproducible. We focus on R and Python, but many of the tips apply to any programming language. Anna Krystalli introduces some ways to organise files on your computer and to document your workflows. Laura Graham writes about how to make your code more reproducible and readable. François Michonneau explains how to write reproducible reports. Tamora James breaks down the basics of version control. Finally, Mike Croucher describes how to archive your code. We have also included a selection of helpful tips from other scientists.

The guide proposes a simple reproducible project workflow, and a guide to organizing projects for reproducibility. The Programming section provides concrete tips and traps to avoid (example: use relative, not absolute pathnames), and the Reproducible Reports section provides a step-by-step guide for generating reports with R Markdown.

While written for an ecology audience (and also including some gorgeous photography of animals), this guide would be useful for anyone in the science looking to implement a reproducible workflow. You can download the guide at the link below.

British Ecological Society: A Guide to Reproducible Code in Ecology and Evolution (via Laura Graham)

Music Generation with Azure Machine Learning

Cortana Intelligence and ML Blog Team — Wed, 06 Dec 2017 17:00:31 +0000

This post is authored by Erika Menezes, Software Engineer at Microsoft.

Using deep learning to learn feature representations from near-raw input has been shown to outperform traditional task-specific feature engineering in multiple domains in several situations, including in object recognition, speech recognition and text classification. With the recent advancements in neural networks, deep learning has been gaining popularity in computational creativity tasks such as music generation. There has been great progress in this field via projects such as Magenta, an open-source project focused on creating machine learning projects for art and music, from the Google Brain team, and Flow Machines, who have released an entire AI generated pop album. For those of you who are curious about music generation, you can find additional resources here.

This goal of our work is to provide data scientists who are new to the field of music generation guidance on how to create deep learning models for music generation. As a sample, here is music that was generated by training an LSTM model.

In this post, we show you how to build a deep learning model for simple music generation using the Azure Machine Learning (AML) Workbench for experimentation.

Here are the most important components for a deep learning model for music generation:

Dataset: The data used for training the model. In this work we will use the scale-chords dataset.
Input Representation: A meaningful vector representation of music notes. In this work we will use a piano roll representation.
Model Architecture: The deep learning model architecture for learning the task of predicting some set of musical notes, given an input of preceding musical notes. This work uses a Sequence-to-Sequence model using multi-layered Long Short-Term Memory (LSTM) to achieve this.

Dataset

Music is available in a variety of digital audio formats ranging from raw audio (WAV) to more semantic representations such as MIDI (Musical Instrument Digital Interface), ABC, and sheet music. MIDI data already contains the information needed to feed the Deep Neural Network, we just need to transform it into an appropriate numeric representation to train the model. In next section discusses the details of this transformation.

For this work we will use the scale-chords dataset from here. Download the dataset (free small pack) that contains 156 scale chords files in MIDI format. Let’s take a closer look at MIDI.

MIDI

MIDI is a communications protocol for electronic musical instruments. A Python representation of a MIDI file looks something like this:

MIDI represents time in ‘ticks’ which essentially represent delta times, i.e. each event’s tick is relative to the previous one. Each MIDI file’s header contains the resolution of that file which gives us the number of ticks per beat. The MIDI file consists of one or more tracks that further consist of event messages such as the following:

SetTempoEvent: Indicates the tempo in 8-bit words.
NoteOnEvent: Indicates that a note has been pressed or turned on.
NoteOffEvent: Indicates that a note has been released or turned off.
EndOfTrackEvent: Indicates that the track has ended.

Music Theory 101

Beat: Basic unit of time in music, a.k.a. quarter note.
Note: Pitch or frequency of the note played. E.g. note 60 in MIDI is C5 on the piano, which is 261.625 Hz.
Tempo: Expressed as Beats per minute (BPM) = Quarter notes per minute (QPM)
Microseconds per quarter note (MPQN) = MICROSECONDS_PER_MINUTE / BPM.

Input Representation

Input Representation is a crucial component of any music generation system. To feed the MIDI files to the Neural Network, we transform the MIDI to a piano roll representation. A piano roll is simply a 2D matrix of notes vs. time, where the black squares represent a note being pressed at that point in time. We use the MIDO python library to achieve this. The figure below shows a sample piano roll for notes in one octave.

To transform the MIDI file shown above into a piano roll we need to quantize the MIDI events by time. The important trick to quantize the MIDI events is to understand how to convert MIDI ticks to absolute time. To do this we simply multiply the tempo (beats per minute) by the resolution (ticks per beat) and that gives us the ticks per second.

Model Architecture

Recurrent Neural Networks (RNN) are well suited for sequence prediction tasks as they can memorize long-range dependencies from input sequences using recurrent or looped connections. LSTMs are a special type of RNN that have multiplicative gates that enable them to retain memory for even longer sequences, making them useful for learning the sequential patterns present in musical data.

With this reasoning we decided to use an LSTM Sequence-to-Sequence model as shown in the figure below.

A Sequence-to-Sequence model (Seq2Seq) is made up of an Encoder (encode input) and Decoder (decode output) to convert sequences from one domain, such as sentences in English, to sequences in another domain, such as the same sentences translated into French. This has been commonly used for machine translation or for freeform question answering. As in language translation, in music, the notes played during a given time period depend on several preceding notes, and Sequence-to-Sequence models are able to generate output sequences after seeing the entire input.

In the figure shown above, we train the network to generate some length of music notes given some preceding notes. In order to create the training set we use a sliding window over the piano roll. Consider the case where we have a piano roll of dimensions 12×10, where 12 is the number of notes in one octave and 10 is the number of columns in a piano roll, where each column represents some absolute time. Assuming a sliding window of 5, the first 5 columns are fed to the encoder as the input and the next 5 are the target which the model tries to learn. Since we are generating polyphonic music, i.e. multiple notes being on at the same time, this is a multi-label classification problem and hence we use the binary cross entropy loss.

Once we are done training our network, we can carry out testing that will generate some music for us. In this case, test data is fed to the encoder and the outputs from the decoder represent the music generated by the model!

It should be noted that one limitation of Sequence-to-Sequence models is that they get overwhelmed when given very long inputs, and they need other sources of context such as attention in order to focus on specific parts of the input automatically. We will not be going over this aspect as it’s beyond the scope of this post.

From a higher-level perspective, the flow of data through the system looks like this:

The next section walks you through how to set this up with Azure ML Workbench.

Getting Started with Azure Machine Learning

Azure Machine Learning provides data scientists and ML developers with a toolset for data wrangling and experimentation and it includes the following:

AML Workbench. See setup and installation documentation.
AML Experimentation Service. See configuration documentation.
AML Model Management. See manage and deploy documentation.

We provisioned Data Science VMs (DSVMs) with GPUs and used the remote Docker execution environment provided by Azure ML Workbench (see details and more information on execution targets) for training models. Azure ML allows you to track your run history and model metrics through the Azure ML Logging API which helps us compare different experiments and compare results visually.

Training a Music Generation Model

The Sequence-to-Sequence model described in the previous section is implemented using Keras. In this section we are going to focus on the training setup.

Using Azure ML Workbench for Training on a Remote VM

The Azure ML Workbench provides an easy way to scale out to environments such as a Data Science VM with GPUs that enables faster training for deep learning models and isolated, reproducible, and consistent runs for all your experiments. The code for this project can be found here.

Step1: Setup remote VM as execution target

az ml computetarget attach –name “my_dsvm” –address “my_dsvm_ip_address” –username “my_name” –password “my_password” –type remotedocker

Step 2: Configure my_dsvm.compute

baseDockerImage: microsoft/mmlspark:plus-gpu-0.7.91
nvidiaDocker: true

Step 3: Configure my_dsvm.runconfig

EnvironmentVariables:
“STORAGE_ACCOUNT_NAME”:
“STORAGE_ACCOUNT_KEY”:
Framework: Python
PrepareEnvironment: true

We use Azure storage for storing training data, pre-trained models and generated music. The storage account credentials are provided as EnvironmentVariables.

Step 4: Conda_dependencies.yml

Step 5: Prepare the remote machine

az ml experiment –c prepare m_dsvm

Step 6: Run the experiment

az ml experiment submit -c my_dsvm Musicgeneration/train.py

Evaluation Workflow

Comparing Runs

As part of experimenting with machine learning models, we would like to compare effects of different batch sizes and model hyper-parameters. We can visualize this in the run history for different epoch sizes and compare different runs with custom outputs as shown below.

The next figure below shows how we can compare runs for 10, 50 and 100 epochs and look at the corresponding loss curves.

Scoring = Music Generation!

Now you can generate music by loading the models created in the training step and calling model.predict() to generate some music. The code for this is in MusicGeneration/score.py.

az ml experiment submit -c my_dsvm Musicgeneration/score.py

Summary

In this blog post, we showed you how to build your own deep learning music generation model using Azure Machine Learning. This gives you a framework for agile experimentation with fast iterations and provides an easy path for scaling up and out to remote environments such as Data Science VMs with GPUs.

Once you have an end-to-end deep learning model that can produce music, you can experiment with different sequence lengths and different model architectures and listen to their effects on the music generated. Happy music generation!

Erika

Resources

Code for this project can be found here.
Azure Machine Learning documentation.
Dataset: Scale-chords from feelyoursound.com.

Acknowledgements

Thanks to Wee Hyong Tok and Mathew Salvaris for their guidance and for reviewing this article, and to Matt Winkler, Hai Ning and Serina Kaye for all their help with Azure Machine Learning.

On the biases in data

David Smith — Tue, 05 Dec 2017 15:15:53 +0000

Whether we're developing statistical models, training machine learning recognizers, or developing AI systems, we start with data. And while the suitability of that data set is, lamentably, sometimes measured by its size, it's always important to reflect on where those data come from. Data are not neutral: the data we choose to use has profound impacts on the resulting systems we develop. A recent article in Microsoft's AI Blog discusses the inherent biases found in many data sets:

“The people who are collecting the datasets decide that, ‘Oh this represents what men and women do, or this represents all human actions or human faces.’ These are types of decisions that are made when we create what are called datasets,” she said. “What is interesting about training datasets is that they will always bear the marks of history, that history will be human, and it will always have the same kind of frailties and biases that humans have.”
— Kate Crawford, Principal Researcher at Microsoft Research and co-founder of AI Now Institute.

“When you are constructing or choosing a dataset, you have to ask, ‘Is this dataset representative of the population that I am trying to model?’”
— Hanna Wallach, Senior Researcher at Microsoft Research NYC.

The article discusses the consequences of the data sets that aren't representative of the populations they are set to analyze, and also the consequences of the lack of diversity in the fields of AI research and implementation. Read the complete article at the link below.

Microsoft AI Blog: Debugging data: Microsoft researchers look at ways to train AI systems to reflect the real world

AI School: Microsoft R and SQL Server ML Services

David Smith — Mon, 04 Dec 2017 20:40:45 +0000

If you'd like to learn how you use R to develop AI applications, the Microsoft AI School now features a learning path focused on Microsoft R and SQL Server ML Services. This learning path includes eight modules, each comprising detailed tutorials and examples:

All of the Microsoft AI School learning paths are free to access, and the content is hosted on Github (where feedback is welcome!). You can access this course and many others at the link below.

Microsoft AI School: Microsoft R and SQL Server ML Services

Microsoft @ NIPS 2017, Long Beach, CA

Cortana Intelligence and ML Blog Team — Fri, 01 Dec 2017 21:21:30 +0000

Re-posted from the Microsoft Research blog.

The thirty-first annual conference on Neural Information Processing Systems (NIPS) starts on Monday next week, and is being held in Long Beach, CA, from December 4^th through 9^th, 2017.

The event, which is completely sold out, is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers.

Microsoft has always had a strong presence at NIPS and this year is no different. We have several employees taking part at the event, including as organizing committee members, workshop and symposium organizers, invited speakers and more. Our researchers and engineers have co-authored dozens of accepted papers, contributed to several posters and are also involved in key workshops that are a part of the event.

For complete details, including the list of accepted papers, posters and workshops from our team, as well as links to ML-related job opportunities at Microsoft, be sure to check out the original post here.

We look forward to meeting several of you at NIPS Long Beach next week!

ML Blog Team

Because it’s Friday: The Whole of the Moon

David Smith — Fri, 01 Dec 2017 21:11:52 +0000

As we've noted before, the Solar System is a big place. You can watch a voyage from the Sun to Jupiter, and it takes 45 minutes at the speed of light. A scale model of the Solar System, with the Sun the size of a weather balloon, is 3.5 miles across ... and that's not even including Pluto. And this virtual scale model, a browser-based rendition by John Worth with the moon just one pixel in size, is no less impressive. I can't do it justice here — the site surely holds the record for the widest horizontal scrollbar on any page on the Web — so here's a little snippet of the Earth-Moon system:

While you can use the astrological symbols at the top to jump to the planets, try manually scrolling to get the full effect of the vast spaces between them. There are also some useful tidbits to discover along the way...

That's all from us here at the blog for this week. Have a great weekend, and we'll see you back here on Monday. Enjoy!

A case study in messy data analysis: the Australian same-sex marriage survey

David Smith — Fri, 01 Dec 2017 19:23:49 +0000

Last month the Australian people signaled their approval of legalizing same-sex marriage by a 62%:38% margin in a national survey. (On a personal note, I was elated and relieved by the result: my husband and I have discussed eventually retiring to Australia, and with this decision our marriage would be recognized there.) While fears of a surprise Brexit-like electoral backlash proved unfounded, researchers including R user Miles McBain explored the results for correlations to demographic variables. This process wasn't as simple as it might have been though: the Australian Bureau of Statistics released the results as a pair of Excel files that violate just about every good practice for sharing data in spreadsheets:

Miles shares the R code he used to extract useful data from this spreadsheet as a blog post that makes a great case study in dealing with messy data using R. The post demonstrates how he used the read_excel function (from readxl package) to extract specific sub-tables from the spreadsheet by specifying row and column ranges, and then use the dplyr package to clean up and merge the data. If you want to explore the data yourself, you can find the R code and the source data in this Github repository.

In a follow-up post, Miles combines the same-sex marriage survey data with Australian Census data to explore various demographic relationships. Unlike the US Census data (which is easily accessible in R thanks to the tidycensus package), there's no interface package for Australian Census data. (Selected tables are available in the Census2016 package, however.) Instead, Miles demonstrates how to use R to download and extract data from the the "Census DataPacks" (CSV data files and Excel data dictionaries) provided by the Australian Bureau of Statistics. Yet more data wrangling allows Miles to create summary charts of the responses, such as this chart of proportion voting No by percent of the district population declaring a religious affiliation, broken down by state. As you may expect, those districts with more religious populations voted No at greater rates.

Both of these post provide great examples of working with government data, which is often provided in inconvenient formats with messy structures. Follow the links below for step-by-step guides, including the R code used to extract the data, structure it for analysis, and create useful charts.

Medium (Miles McBain): Tidying the Australian Same Sex Marriage Postal Survey Data with R; Combining Australian Census data with the Same Sex Marriage Postal Survey in R

Azure Database Migration Service Preview brings the “lift and shift”

SQL Server Team — Thu, 30 Nov 2017 20:00:15 +0000

It’s time to get excited—the Azure Database Migration Service Public Preview is here to help you move on-premises SQL Servers to the cloud with near-zero downtime.

There’s never been a better time to start migrating your database to the cloud. Getting ahead in business today means moving to the cloud so that your company can grow and succeed.

The Azure Database Migration Service Preview is the quickest and easiest way for businesses to migrate on-premises databases to the cloud. It’s a fully managed migration service designed to work together with our time-tested migration engines such as the Data Migration Assistant, the Database Experimentation Assistant, and SQL Server Migration Assistant. These tools are tuned to ensure the best migration experience, whether upgrading from legacy versions of Microsoft SQL Server or moving from sources such as Oracle, Sybase, DB2, MySQL, and others. The new service uses a guided, easy-to-implement process to streamline tasks. Regular operations can continue normally during the migration.

Azure Database Migration Service Public Preview uses this functionality to provide rich orchestration capabilities that enable you to organize your databases into project(s) and perform source assessment, schema and data conversion, and validation activities. These activities can be assigned to one or more compute nodes to meet your budget and timeframe goals. With these capabilities, Azure Database Migration Service Public Preview makes it easy to plan migration tasks, run proof-of-concept migrations, author scripting for automation, and ensure that your final production migration to the Microsoft Data Platform is friction-free.

All these capabilities allow you to focus on what’s important for your business and accelerate your Data Estate transformation with the cloud. Now is a great time to find out more about how to make your organization’s database migration as smooth and seamless as possible. To start, check out the Database Migration Guide, which allows you to customize your migration plan based on your organization’s source and target data platforms, along with other key criteria. For a broad scope of information about cloud migration in general, visit the Azure Migration Center.

Most importantly, be sure to take advantage of the Azure Database Migration Service Public Preview today.

R 3.4.3 released

David Smith — Thu, 30 Nov 2017 18:18:45 +0000

R 3.4.3 has been released, as announced by the R Core team today. As of this writing, only the source distribution (for those that build R themselves) is available, but binaries for Windows, Mac and Linux should appear on your local CRAN mirror within the next day or so.

This is primarily a bug-fix release. It fixes an issue with incorrect time zones on MacOS High Sierra, and some issues with handling Unicode characters. (Incidentally, representing international and special characters is something that R takes great care in handling properly. It's not an easy task: a 2003 essay by Joel Spolsky describes the minefield that is character representation, and not much has changed since then.) You can check out the complete list of changes here. Whatever your platform, R 3.4.3 should be backwards-compatible will other R versions in the R 3.4.x series, and so your scripts and packages should continue to function as they did before.

The codename for this release is "Kite-Eating Tree", and as with all R codenames this is a references to a classic Peanuts episode. If you're interested in the source of other R release names, Lucy D'Agostino McGowan provides the Peanuts references for R release names back to R 2.14.0.

r-devel mailing list: R 3.4.3 is released

How to generate a Secret Santa list with R

David Smith — Wed, 29 Nov 2017 22:49:23 +0000

Several recent blog posts have explored the Secret Santa problem and provided solutions in R. This post provides a roundup of various solutions and how they are implemented in R.

If you wanted to set up a "Secret Santa" gift exchange at the office, you could put everyone's name into a hat and have each participant draw a name at random. The problem is that someone might draw their own name, but if that happens you can just reshuffle all the names back into the hat and start the process over. That's essentially what the R code below, from a blog post by David Selby, does:

🙈🎁 w/ code:
"Secret Santa in R" ✏ @TeaStats https://t.co/HFrh8w6CWU #rstats pic.twitter.com/cGfNA45CbG
— Mara Averick (@dataandme) November 24, 2017

That's not an entirely satisfying solution (at least to me), with all of the having to check for self-giving and restarting if so. Thomas Lumley calculates that the chance of requiring a do-over is about 63% (for more than 2 participants, anyway), and on average you'd need about 2.7 tries to get a "valid" gift list. (This is an example of the negative binomial distribution in action: we keep on drawing a set of names from the hat, until we get a set that has no-one giving to themselves. You can generate 100 examples of this playing out in R with rnbinom(100,1, exp(-1))+1 — the +1 is for the final successful draw from the hat.)

An easier way might be simply to seat all the participants in random order in a circle and assign them to give a gift to the person on their right. This is easy to do in R, and doing it in code has the benefit of keeping the recipients secret from each other. First, let's select 10 names from the babynames dataset:

> library(babynames)
> santas <- sample(babynames$name, 10, prob=babynames$prop)

(Using prob=babynames$prop selects names according to their prevalence in US births 1880-2015.) Then, it's a simple matter of reordering the names at random, and assigning a gift to the next in line:

> p <- sample(santas)
> cbind(santa=p, recipient=c(tail(p,-1),p[1]))
      santa       recipient  
 [1,] "Sherman"   "Shayna"   
 [2,] "Shayna"    "Elizabeth"
 [3,] "Elizabeth" "Mary"     
 [4,] "Mary"      "Kathleen" 
 [5,] "Kathleen"  "Russell"  
 [6,] "Russell"   "James"    
 [7,] "James"     "Arlene"   
 [8,] "Arlene"    "Ruth"     
 [9,] "Ruth"      "Darryl"   
[10,] "Darryl"    "Sherman"

Now, this "sit in a circle" process isn't quite the same as the "keep on drawing names from the hat" process. In the above example, the 10 names form a cycle, and it will never happen that Arlene gives to Ruth and Ruth gives to Arlene. Nonetheless, I think it's the simplest and fairest way of generating a Secret Santa list.

Thinking of the gift-giving relationship as a graph, the process above generates a Hamiltonian path through each of the recipients, and never generates more than one clique. Tristan Mahr explores the graph-theory nature of the Secret Santa problem in a blog post. There, he uses the DiagrammeR package to solve variants of the Secret Santa problem by constructing graphs, which can created and visualized quite easily in R.

Finally, Sarah Lotspeich and Lucy D'Agostino McGowan take the whole process one step further and show how to generate emails each participant using the ponyexpress package, notifying them of their Secret Santa recipient.

SQL Server 2017: The world’s first enterprise-class diskless database

SQL Server Team — Wed, 29 Nov 2017 17:00:17 +0000

This post is authored by Bob Ward, Principal Architect, and Jamie Reding, Senior Program Manager and Performance Architect, Microsoft Database Systems Group.

Perhaps you saw the keynote at the recent PASS 2017 Summit where Microsoft demonstrated the performance of the world’s first enterprise-class “diskless database”. This demonstration showed how Microsoft, Hewlett Packard Enterprise (HPE), and SUSE Linux Enterprise Server partnered together to deliver > 5x performance on analytic queries directly against storage at up to 50 percent of the cost.

Recently HPE published a new world record 1TB TPC-H benchmark result¹ using this configuration with their DL380 Gen10 Server showing 1,009,065 QphH at an incredible price/performance of $0.47 USD per QphH. Performance and price are achieved by combining the power of SQL Server 2017, HPE’s scalable persistent memory, and SUSE Linux Enterprise 12 SP3 Persistent Memory Support.

HPE’s scalable persistent memory is a new innovation which combines standard memory with the persistence of standard storage. It allows database engines like SQL Server 2017 to retrieve data from its data files in a matter of seconds.

To see this technology in action, check out this video. To learn more about this amazing result and technology, read HPE’s blog post and SUSE’s blog post. To learn more about SQL Server on SUSE Linux Enterprise Server, check out SUSE’s SQL Server on Linux website.

SQL Server 2017 is the world leader in TPC-H performance, price, and value and continues to demonstrate that it is one of the fastest databases on the planet, in your cloud or ours.

References

¹ 1TB non-clustered TPC-H result http://www.tpc.org/3331 as of November 21, 2017.

Cumulative Update #2 for SQL Server 2017 RTM

SQL Server Engineering Team — Wed, 29 Nov 2017 01:49:25 +0000

The 2nd cumulative update release for SQL Server 2017 RTM is now available for download at the Microsoft Downloads site. Please note that registration is no longer required to download Cumulative updates. To learn more about the release or servicing model, please visit:

CU2 KB Article: https://support.microsoft.com/en-us/help/4052574

Starting with SQL Server 2017, we are adopting a new modern servicing model. Please refer to our blog for more details on Modern Servicing Model for SQL Server

Microsoft® SQL Server® 2017 RTM Latest Cumulative Update: https://www.microsoft.com/download/details.aspx?id=56128
Update Center for Microsoft SQL Server: http://technet.microsoft.com/en-US/sqlserver/ff803383.aspx