Data Speaker Series: Thorsten Dietzsch on Building Data Products at Zalando

by Sirin Odrowski on August 15, 2017August 15, 2017

In the first of our Data Speaker Series posts, Thorsten Dietzsch shares how data products are managed at Zalando, a fashion ecommerce company.

How Communication Density Fuels Automattic

by Demet Dagdelen on August 8, 2017

How does meeting in person affect our interpersonal communication at Automattic? Demet Dagdelen reveals all.

This Week in Data Reading

by Krista Stevens on July 25, 2017July 26, 2017

This week, Sirin, Boris, and Demet have some recommended reading for you in the fields of descriptive data analysis, machine learning, and artificial intelligence.

Real-Time Elasticsearch Indexing on WordPress.com

by Greg Ichneumon Brown on July 11, 2017July 11, 2017

Love databases, indexing, and Elasticsearch gymnastics? Greg Brown walks us through the indexing sausage factory on WordPress.com.

data.coalesce() — Automattic Data Division Meets in Montréal

by Carly Stambaugh on June 29, 2017June 29, 2017

Want to know what Automattic data wranglers do when they meet up? Carly Stambaugh takes you behind the scenes.

Time Series Analysis: When “Good Enough” is Good Enough

by Boris Gorelik on June 12, 2017June 13, 2017

Anomaly detection and time series forecasting are valuable in monitoring the financial and technical health of an organization. Proper modeling of time series requires accounting for periodic fluctuation; malicious users; data irregularity, saturation or scarcity; sudden peaks and drops. To account for these parameters, the modeler needs to select the proper model family, optimize the model parameters, validate the assumptions, and refine the process as needed. The task is even more complicated when one needs to build a self-service application that supports "slicing and dicing" any metric to its underlying components. In such a case, where the number of possible models can be counted by thousands, manual tuning is impossible. In this lecture, I show how a series of assumptions and simplifications allowed completing the modeling task in one week, using open source Python packages. I will review all the assumptions, their implications, and limitations. I will also show which modeling approach worked, and which didn't work in case of Automattic, the company behind WordPress.com, Jetpack and other projects, that serves more than 180,000,000 unique visitors a month in the US alone. I hope that this information will be useful in many data-driven organizations.

This Week in Data Reading

by Krista Stevens on June 6, 2017June 7, 2017

This week, Charles, Xaio, Chris, and Boris share pieces on machine learning, MySQL, pop lyrics, and career advice.