Research Blog: January 2010

Google Research Blog

The latest news from Research at Google

Research Areas of Interest: Building scalable, robust cluster applications

Wednesday, January 27, 2010

Posted by Brad Chen, Technical Lead/Managerherehere

Resource sharing: Stranded resources like idle memory, CPU, and disk bandwidth represent huge capital and operating expenses that deliver no business value. A cluster system based upon the best published research would be likely to leave 50% or more of hardware resources idle. We encourage researchers to explore hardware/software architectures that facilitate more supple sharing to avoid stranded and underutilized computational resources.

Balancing cost, performance, and reliability: Current cluster applications tend to be excessively rigid and brittle, offering only coarse controls to tune the balance between reliability, performance and cost. We envision systems that allow cost to be optimized based on an input specification of performance and reliability requirements. An effective solution might allow service level settings to propagate downward through the layered structure of the system.

Self-maintaining systems: The level of expertise required to troubleshoot today's large systems is one of the biggest barriers to more and larger deployments. The published research in this area has at best marginally improved the need for such rare expertise. We envision systems that can adapt automatically to changing conditions, in which redundancy and multiple geographically distributed data centers simplify rather than complicate manageability. This will require breakthroughs in monitoring and data analysis to address the diversity of failure modes and simplify the task of keeping systems healthy.

Mulitmedia

Google

Google Cluster Data

Thursday, January 07, 2010

Posted by Joseph L. Hellerstein, Manager of Google Performance Analytics

Workload characterizations: How can we characterize Google workloads in a way that readily generates synthetic work that is representative of production workloads so that we can run stand alone benchmarks?

Predictive models of workload characteristics: What is normal and what is abnormal workload? Are there "signals" that can indicate problems in a time-frame that is possible for automated and/or manual responses?

New algorithms for machine assignment: How can we assign tasks to machines so that we make best use of machine resources, avoid excess resource contention on machines, and manage power efficiently?

Scalable management of cell work: How should we design the future cell management system to efficiently visualize work in cells, to aid in problem determination, and to provide automation of management tasks?

here

Time (int) - time in seconds since the start of data collection

JobID (int) - Unique identifier of the job to which this task belongs

TaskID (int) - Unique identifier of the executing task

Job Type (0, 1, 2, 3) - class of job (a categorization of work)

Normalized Task Cores (float) - normalized value of the average number of cores used by the task

Normalized Task Memory (float) - normalized value of the average memory consumed by the task

feedback

Google

Labels

accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
Android
API
App Engine
App Inventor
April Fools
Audio
Australia
Automatic Speech Recognition
Awards
Cantonese
China
Chrome
Cloud Computing
Collaboration
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
data science
datasets
Deep Learning
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Exacycle
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Genomics
Gmail
Google Books
Google Drive
Google Science Fair
Google Sheets
Google Translate
Google Voice Search
Google+
Government
grants
HCI
Health
High Dynamic Range Imaging
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Klingon
Korean
Labs
Linear Optimization
localization
Machine Hearing
Machine Intelligence
Machine Learning
Machine Translation
MapReduce
market algorithms
Market Research
ML
MOOC
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Ngram
NIPS
NLP
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
ph.d. fellowship
PiLab
Policy
Professional Development
Public Data Explorer
publication
Publications
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Search
search ads
Security and Privacy
SIGCOMM
SIGMOD
Site Reliability Engineering
Software
Speech
Speech Recognition
statistics
Structured Data
Systems
TensorFlow
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube

Feed

Give us feedback in our Product Forums.

Company-wide

Official Google Blog
Public Policy Blog
Student Blog

Products

Android Blog
Chrome Blog
Lat Long Blog

Developers

Developers Blog
Ads Developer Blog
Android Developers Blog

Google
Privacy
Terms