For 8 years now, we've maintained a list of local R user groups here at the Revolutions blog. This is a list that began with a single group (the Bay Area RUG, the first and still one of the largest groups), and now includes 360 user groups worldwide (including 27 specifically for women).
As the list has grown in size, it's become harder to manage. Thankfully, Colin Gillespie of Jumping Rivers Consulting has risen to the task, by creating a new website based on a GitHub repository that anyone can contribute to. I've updaed the Local R User Group Directory to point to these new pages, specifically the lists of:
If you have a group of your own, contributing to the list is easy. All you need is a GitHub account, and you can click the Edit button to edit one of the R Markdown pages directly. If you're not familiar with R Markdown, you can also suggest an edit via the Issues page.
(Incidentally, It would be great to automate the process of generating a count and a map of local R user groups. If anyone wants to take up the challenge of writing an R script to process the Rmd pages, please do!)
As R grows in popularity, it's awesome to see local communities get together and form these groups. If you'd like to start one yourself, here are some tips on starting up a local user group.
In the follow-up to the useR! conference in Stanford last year, the Women in R Task force took the opportunity to survey the 900-or-so participants about their backgrounds, experiences and interests. With 455 responses, the recently-published results provide an interesting snapshot about the R community (or at least that subset able to travel to the US and who were able to register before the conference sold out). Among the findings (there are summaries; check the report for the detailed breakdowns):
33% of attendees identified as women
26% of attendees identified as other than White or Caucasian
5% of attendees identified as LGBTQ
The report also includes some interesting demographic analysis of the attendees, including the map of home country distribution shown below. The report also offers recommendations for future conferences, one of which has already been implemented: the useR!2017 conference in Brussels will offer child-care for the first time.
Relatedly, the Women in R Task Force has since expanded its remit to promoting other under-represented groups in the R community as well. To reflect the new focus, the task force is now called Forwards, and invites members of the R community to participate. If you have an interest in supporting diversity in race, gender, sexuality, class or disability, follow that link to get in touch.
Two groups are making and impact in improving the gender diversity of R users worldwide. The R-Ladies organization is creating chapters worldwide to facilitate female R programmers meeting and working together, and the Taskforce on Women in the R Community is working to improve the participation and experience of women in the R community.
R has more participation by women than many programming communities, but there's still a long way to go towards equity: the R Foundation's Women in R Taskforce estimates that between 11% and 15% of R package authors are women. (The count is based on package author first names; some manual corrections were needed because Hadley is categorized by genderizeR as a female name.)
In 2012, Gabriela de Queiroz founded the first women-focused R user group in San Francisco. Since then, "R Ladies" concept has expanded to a global franchise in eleven cities:
This is a great first step to increasing the participation by women in a currently male-dominated field. As the R-Ladies leadership note in their grant application to the R Consortium (which was funded in July):
The R community suffers from an underrepresentation of women in every role and area of participation, whether as leaders, package developers, conference speakers, conference participants, educators, or users. The R community needs to promote the growth of this major untapped demographic by proactively supporting women to fulfill their potential, thus enabling and achieving greater participation.
In addition to the groups created by R-Ladies, the Women in R Task Force is making strides towards achieving these goals. For example, the next useR! conference in Brussels will strive for gender balance amongst invited speakers, tutors, and committee and session chairs, and ensure particpation by women on panel discussions. Childcare will also be provided at the conference, and gender statistics will be published on the website.
For more information on R-Ladies (including contacts to help you create a local chapter in your area), and the Women in R Taskforce, follow the links below.
For quite a few years now we have attempted to maintain the Revolution Analytics' Local R User Group Directory as the complete and authoritative list of R user groups. Meetup groups make this list in one of two ways: we discover the group because they have a web page of some sort proclaiming the group to be focused on the R language or someone from the group writes to us asking to have the group included in our list. We have deliberately pursued a relatively conservative strategy in growing the list, resisting the temptation to include every data science user group that may have had an R related presentation. Even so, we have been pleased to note the slow but steady growth of R user groups, and delighted to see quite a few relatively new meetup groups from South America and Africa make our list which, as of today, includes 235 groups.
However, it is interesting to occasionally take a broader view. Meetup.com, a very popular site for hosting user meetups of all stripes, lists a number of groups who identify with the keyword "R Project For Statistical Computing" in their "We're About" section. Using the site's tools to filter on this keyword will bring you to a page (Subject to daily fluctuations) containing somewhere in the neighborhood of 358 meetup spread over 223 cities and 55 countries. The following plot displays the top 25 meetups by number of members for May 15, 2016.
Most of these are indeed R user groups, or at least data science user groups with an interest in R. However, the presence of Find a Tech Job in London in the top 25 indicates that interest in R is spreading to a somewhat wider audience throughout the worldwide tech culture. So, while the Local R User Group Directory is still likely to be your best bet to finding a hard core RUG, meetup.com may lead you an R conversation in some surprising places.
I always think of Strata Hadoop World and Predictive Analytics World as initiating the Spring conference season here in the San Francisco Bay Area. The rainy season is usually over by the end of March and it is a perfect time to visit. If you are traveling to either of these conferences from out of town and you are an R aficionado please arrange your plans to include one one or both of the Bay Area useR Group, BARUG, meetings that will be held in conjunction with these events.
For the past few years, the folks at O'Reilly and the organizers of PAW have very generously arranged for us to hold meetings on-site at the conference locations. This year, even though the two BARUG meeting will be held within a week of each other, we have manged to assemble a panel of speakers that I expect will rival the line up at the conferences themselves.
Our San Jose, Strata meeting will be held on March 29th. David Smith will begin the evening by describing how R is being integrated into the warp and woof of Microsoft's offerings. Hadley Wickham will present his ideas about managing many models with the help of the tidyr, purrr and broom R packages and H20'sErin LeDell will follow up on Hadley's theme and talk about model management with model stacking or "super learning" as it is sometimes called. This recent video of Hadley speaking at the WOMBAT conference in February covers some of the themes that he will be presenting.
We have assembled an even more diverse and eclectic line up of speakers for our April 4th Predictive Analytics World meeting in San Francisco. Keith Everett will present the GLMNet model behind his Oscar Predictions model. Then Megan Price, a long-time member and supporter of BARUG, will present the statistical techniques and R packages that the Human Rights Data Analysis Group, HRDAG, uses to compensate for the missing data problems that plague attempts to estimate conflict related casualties in Syria. This will be Megan's first BARUG talk since becoming Executive Director at HRDAG.
After Megan, Blair Hull, legendary trader, founder and managing partner of Ketchum Trading, will talk about market timing, big data and machine learning. Blair is all about algorithms and Ketchum Trading was a founding member of the R Consortium.
Max Kuhn, co-author of Applied Predictive Modeling and frequent BARUG speaker, will close out the evening with a talk on rule based regression models.
If you are planning on attending the meetings please register as soon as you can . Space will be limited.
For details and registration for the the San Jose, Strata meeting look here, and for the San Francisco PAW meeting go here. Also note that both Strata and PAW are offering conference discounts for BARUG members. The discount details are on the meeting sites linked to above.
Finally, if you haven seen the "Predict This" the dance video by PAW's Eric Siegel you can find it here.
Earlier this month the Bay Area useR Group (BARUG) held it annual lightning talk meeting. This is by far our most popular meeting format: eight, 15 minute talks (12 minutes speaking and 3 minutes Q & A while the next speaker is setting up) packed into a two hour time slot. The intensity seems to really energize the speakers and engaged the audience.
Bradley Shanrock-Solberg kicked off the event with delightful example of an R Monte Carlo simulation based on his wildpoker package that you can find on CRAN. I have never seen a more prepared lightning talk presenter: high energy, a royal flush presentation and a four color printed hand out just in case you have trouble keeping up with him for the 12 minutes. In a series of well conceived plots Bradley showed how, for a number of different poker variations, the best hand changes as the game progresses. The number of players who start the game, the number who stay until the showdown, wildcards and many more contingent events dynamically change the value of your hand. Bradley is definitely the guy for your next trip to Vegas.
William Sundstrom, professor of Economics at Santa Clara University, gave an entertaining and thought provoking presentation on teaching R to undergraduate Econometrics students. One interesting observation that generated some discussion was that even though today's students are "digital natives" having grown up using intelligent devices of all kinds, many of them are nevertheless "digital naïfs". The following slide, a reprint of an email from one of Professor Sundstrom's students, captures some of their frustration.
David Ouyang MD, a Stanford resident, a guy who sometime puts in 73+ hour work weeks, presented some explorations of the Epic electronic medical records data set he is analyzing in his spare time. The following plot shows the distribution of physician's interactions with the Epic system over the course of a day.
Dennis Noren, a long time BARUG member and contributor showed some results from a recommendation App he is building on top of The Movie database. The following slide from his presentation shows a Shiny dashboard he build to drive a parallel plot. The interactivity really makes the plot useful.
If you are thinking about betting on the Academy Awards you might want to consider Keith Everett's predictions based on his GLMNet Model.
Nelson Auner talked about Modern KPI Tracking in R. My favorite slide describes the behavior of Execs who don't work for companies that sell BI Software:
And that wasn't all of it! We also had a update on data.table from Matt Dowle himself, and introduction to exploring large data sets with Apache Spark from Hossein Falaki.
Last month I wrote about how several R user groups were making use of GitHub and listed some sites that I thought had interesting material. A few readers were kind enough to point out sites that I had missed; so I would just like to give a couple of "shout outs" here. First of all, I should acknowledge LondonR as a leader in R user group web properties. Their website is very nicely done and valuable. The links to the presentations made to LondonR that they have collected comprise quite a resource to the community.
The following slide which comes from the first presentation on the list, the workshop on Network Analysis in R, is an example of what you can find there.
The repository for the Berlin R User Group also shows great promise. BerlinRUG is beginning to build out a clean, functional webpage and populate it with some very nice material. Honorable mention should go to R-ladies of San Francisco who are searching other GitHub sites for useful material like this data science tutorial from Jonathan Bower, SevillaR, and the Las Vegas R Users group.
The first meeting of R users in Poland took place in Wroclaw in 2008. It was a one-day conference with 27 participants and 6 talks.
Today, we have three large groups of R users in major Polish cities (according to meetup.com there is 640 users in SER - Warsaw, 235 in eRka - Cracow and 64 in PAZUR - Poznań). And there are also some data science groups that are full of R users anyway (450 users in Wroclaw and 115 in Łódź).
We have tried different forms of meetings. Some of them were working; some of them were not. Below I will summarize what we have learnt from them.
Today, the most common formula for our meetings is: two short talks (around 30 minutes) with 30 min break for pizza and networking. The main advantage of this formula is that it is very easy to organize. Just find a large room, order pizza or other snacks and find speakers. You can meet even without speakers, but with them the meeting is much nicer.
In Warsaw we have meetings every month. They usually start at 6pm. Currently, for meetings, we are using a class room at Warsaw Technical University. Previously we were meeting in pubs, coffee shops, other places, but finally it turns out that we need more space, good projector, good access to public transport, and we need place that rather quite silent (in coffee shop you sometimes compete with noise from coffee maker).
Among attenders there are students, some stuff (just few people) but majority of participants are graduates that are now working in data-oriented businesses. So 6pm is a good starting time since we can meet after work. We finish around 8pm – good time for after-party.
We also have organised hackatons (two in Warsaw, three in Cracow - watch the promo),
which were pretty cool, but such meetings require much more preparation and organization. Some of them resulted in interesting outcomes, like this diagram of flow between parties in Poland in previous cadence of Sejm. And there is fun in working with completely new people. After few hours it turns out that you can easily collaborate and you have learned a lot.
Talks are very diversified. Some of them are in Polish some in English, some are technical some are applied and some related to methodology. The 30 minutes per talk turns out to be enough to get people interested and ignite discussions.
Once we have tried a cinema–like meeting. Together (around 40 people) we were watching videos about deep networks, eating pizzas and sharing experience related to what we have heard. It wasn’t bad, but after all it is much better to have a live speaker.
After all I think that it is good to try different forms. It is interesting to see what is working and what is not.
The key ingredients are of course speakers. So I just would like to send thanks to our recent roster.
The community is large and has big dreams. In 2014, we had three day Polish R conference called PAZUR. This year (October 12-14) we are going to organize European R users meeting [eRum] in Poznań. So be prepared to meet us at eRum!
Quite a few times over the past few years I have highlighted presentations posted by R user groups on their websites and recommended these sites as a source for interesting material, but I have never thought to see what the user groups were doing on GitHub. As you might expect, many people who make presentations at R user group meetings make their code available on GitHub. However as best as I can tell, only a few R user groups are maintaining GitHub sites under the user group name.
The Indy UseR Group is one that seems to be making very good use of their GitHub Site. Here is the link to a very nice tutorial from Shankar Vaidyaraman on using the rvest package to do some web scraping with R. The following code which scrapes the first page from Springer's Use R! series to produce a short list of books comes form Shankar's simple example.
# load librarieslibrary(rvest)library(dplyr)library(stringr)# link to Use R! titles at Springer site
useRlink = "http://www.springer.com/?SGWID=0-102-24-0-0&series=Use+R&sortOrder=relevance&searchType=ADVANCED_CDA&searchScope=editions&queryText=Use+R"# Read the page
userPg = useRlink %>% read_html()## Get info of books displayed on the page
booktitles = userPg %>% html_nodes(".productGraphic img") %>% html_attr("alt")
bookyr = userPg %>% html_nodes(xpath = "//span[contains(@class,'renditionDescription')]") %>% html_text()
bookauth = userPg %>% html_nodes("span[class = 'displayBlock']") %>% html_text()
bookprice = userPg %>% html_nodes(xpath = "//div[@class = 'bookListPriceContainer']//span[2]") %>% html_text()
pgdf = data.frame(title = booktitles, pubyr = bookyr, auth = bookauth, price = bookprice)
pgdf
This plot,which shows a list of books ranked by number of downloads, comes from Shankar's extended recommender example.
I am particularly impressed with the way they have integrated news, content and commentary into their "News" section. Scroll down the page and have look at the care taken to describe and document the presentations made to the group. I found the introduction and slides for Bob Carpenter's RStan presentation very well done.
If your R user group is on GitHub and I have not included you in my short list please let me know about it. I think RUG GitHub sites have the potential for creating a rich code sharing experience among user groups. If you would like some help getting started with GItHub have a look at tutorials on the Murdoch University R User Group webpage.
By Virgilio Gómez Rubio, Spanish R Users Organizing Committee
As every autumn since 2009, Spanish R users gathered at their annual meeting. It is organised by Spanish R users group ‘Comunidad R-Hispano’and took place in 5-6 November in the historic city of Salamanca. The 7th Meeting of Spanish R Users attracted more than 100 R entusiasts and provided a mix of tutorials and contributed talks within the quarters of the University of Salamanca.
Contributed talks focused on four main areas: Applications, Interfaces/Data Mining, Statistical Methodology and Biostatistics. Altogether, there were 22 oral presentations on these topics.
Among the Applications, Marcos Fernández Arias showed how to use R to gather information available on-line to pay a fair price for a new car.Teresa González Arteaga also shared some of her teaching experiences with R in a Degree in Statistics. In the Interfaces/Data Mining section, Christian González Martel and co-authors explored the use of R to use Wikipedia searches of top Spanish companies for investment in the Spanish stock market.
Regarding the contributed session on Statistical Methodology, José Luis Vicente Villardón talked about his experience migrating his code for multivariate analysis using biplots from Matlab to R. Finally, in the Biostatistics section, Carlos Prieto and co-authors presented some interactive plots developed with R and other tools to visualize genomic data.
The prize to the best presentation by a young presenter was awarded to Karel López Quintero for his work on a Price Sensitivity Meter (PSM) with R.
Comunidad R-Hispano is already preparing the 8th Meeting that will take place in November 2016, in the University of Castilla-La Mancha in Albacete and it will be locally organised by the same team that took care of useR! 2013.