'D' of Things

Wednesday, November 2, 2016

Logging challenges for containerized applications: Interview with Eduardo Silva

Jorge Garcia1:17:00 PMcloud, cloud computing, cloud native con, cloudnativecon, CNCF, containers, embedded linux, kubecon, linux, linux container

Next week, another edition of Cloud Native Con conference will take place in the great city of Seattle. One of the key topics in this edition has to do with containers, a software technology that is enabling and easing the development and deployment of applications by encapsulating them for further deployment with only a simple process.

In this installment, we took the opportunity to chat with Eduardo Silva a bit about containers and his upcoming session: Logging for Containers which will take place during the conference.

Eduardo Silva is a principal Open Source developer at Treasure Data Inc where he currently leads the efforts to make logging ecosystem more friendly in Embedded, Containers and Cloud services.

He also directs the Monkey Project organization which is behind the Open Source projects Monkey HTTP Server and Duda I/O.

A well known speaker, Eduardo has been speaking in events across South America and in recent Linux Foundation events in the US, Asia and Europe.

Thanks so much for your time Eduardo!

What is a container and how is it applied specifically in Linux?

When deploying applications, is always desired to have full control over given resources, likely we would like to have this application isolated as much as possible, Containers is the concept of package an application with it entire runtime environment in an isolated way.
In order to accomplish this, from an operating system level, Linux provide us with two features that lead to implement the concept of containers: cgroups and namespaces.

cgroups (control goups) allow us to limit the resource usage for one or more processes, so you can define how much CPU or memory a program(s) might use when running.
on the other hand namespaces (associated to users and groups) allow us to define restricted access to specific resources such as mount points, network devices and IPC within others.

For short, if you like programming, you can implement your own containers with a few system calls. Since this could be a tedious work from an operability perspective, there are libraries and services that abstract the whole details and let you focus on what really matters: deployment and monitoring.

So, what is the difference between a Linux Container and, for example a virtual machine?

A container aims to be a granular unit of an application and its dependencies, it's one or a group of processes. A Virtual Machine runs a whole Operating System which you might guess should be a bit heavy.

So, if we ought to define some advantages of containers versus virtualization, could you tell us a couple of advantages and disadvantages of both?

There're many differences… pros and cons, so taking into account our Cloud world-environment when you need to deploy applications at scale (and many times just on-demand), containers provide you the best choice, deploy a container just takes a small fraction of a second, while deploying a Virtual Machine may take a few seconds and a bunch of resources that most likely will be wasted.

Due to the opportunities it brings, there are some container projects and solutions out there such as LXC, LXD or LXCFS. Could you share with us what is the difference between them? Do you have one you consider your main choice and why?

Having the technology to implement containers is the first step, but as I said before, not everybody would like to play with system calls, instead different technologies exists to create and manage containers. LXC and LXD provide the next level of abstraction to manage containers, LXCFS is a user-space file system for containers (works on top of Fuse).
Since I don't play with containers at low level, I don't have a strong preference.

And what about solutions such as Docker, CoreOS or Vagrant? Any take on them?

Docker is the big player nowadays, it provide good security and mechanisms to manage/deploy containers. CoreOS have a prominent container engine caller Rocket (rkt), I have not used it but it looks promising in terms of design and implementation, orchestration services like Kubernetes are already providing support for it.

You are also working on a quite interesting project called Fluent-Bit. What is the project about?

I will give you a bit of context. I'm part of the open source engineering team at Treasure Data, our primary focus in the team is to solve data collection and data delivery for a wide range of use cases and integrations, to accomplish this, Fluentd exists. It's a very successful project which nowadays is solving Logging challenges in hundreds of thousands of systems, we are very proud of it.
A year ago we decided to dig into the embedded Linux space, and as you might know the capacity of these devices in terms of CPU, Memory and Storage are likely more restricted than a common server machine.
Fluentd is really good but it also have its technical requirements, it's written in a mix of Ruby + C, but having Ruby in most of embedded Linux could be a real challenge or a blocker. That's why a new solution has born: Fluent Bit.
Fluent Bit is a data collector and log shipper written 100% in C, it have a strong focus on Linux but it also works on BSD based systems, including OSX/MacOS. Its architecture have been designed to be very lightweight and provide high performance from collection to distribution.
Some of it features are:

Input / Output plugins
Event driven (async I/O operations)
Built-in Metrics
Security: SSL/TLS
Routing
Buffering
Fluentd Integration

Despite it was initially conceived for embedded Linux, it has evolved, gaining features that makes it cloud friendly without loss of performance and lightweight goals.
If you are interested into collect data and deliver it to somewhere, Fluent Bit allows you to do that through the built-in plugins, some of them are:

Input

Forward: Protocol on top of TCP, get data from Fluentd or Docker Containers
Head: read initial chunks of bytes from a file.
Health: check remote TCP server healthy.
kmsg: read Kernel log messages.
CPU: collect CPU metrics usage, globally and per core.
Mem: memory usage of the system or from a specific running process.
TCP: expect for JSON messages over TCP.

Output

Elasticsearch database
Treasure Data (our cloud analytics platform)
NATS Messaging Server
HTTP end-point

So as you can see, with Fluent Bit it would be easy to aggregate Docker logs into Elasticsearch, monitor your current OS resources usage or collect JSON data over the network (TCP) and send it to your own HTTP end-point.
The use-cases are multiple and this is a very exciting tool, but not just from an end user perspective, but also from a technical implementation point of view.
The project is moving forward pretty quickly an getting exceptional new features such as support to write your own plugins in Golang! (yes, C -> Go), isn't it neat ?

You will be presenting at CNCF event CloudNativeCon & KubeCon in November. Can you share with us a bit of what you will be presenting about in your session?

I will share our experience with Logging in critical environments and dig into common pains and best practices that can be applied to different scenarios.
It will be everything about Logging in the scope of (but not limited to) containers, microservices, distributed Logging, aggregation patterns, Kubernetes, Open Source solutions for Logging and demos.
I'd say that everyone who's a sysadmin, devops or developer, will definitely benefit from the content of this session, Logging "is" and "required" everywhere.

Finally, on a personal note. Which do you consider to be the geekiest songs of this century?

That's a difficult question!
I am not an expert on geek music but I would vouch for Spybreak from Propellerheads (Matrix).

Teradata Partners Conference 2016: Teradata Everywhere

Jorge Garcia11:33:00 AMAmazon Web Services, analytics, big data, cloud computing, data management, data of things, DoT, Microsoft Azure, sentient enterprise, teradata, VMWare

Our technologized society is becoming opaque.

As technology becomes more ubiquitous and our relationship with digital devices ever

more seamless, our technical infrastructure seems to be increasingly intangible.

- Honor Harger

An idea that I could sense was in the air during my last meeting with Teradata’s crew in California, during their last influencer event, was confirmed and reaffirmed a couple of weeks ago during Teradata’s big partner conference: Teradata is now in full-fledged transformational mode.

Of course, for companies like Teradata that are used to being on the front line of the software industry, particularly in the data management space, transformation has now become much more than a “nice to do”. These days it’s pretty much the life breath of any organization at the top of the software food chain.

These companies have the complicated mandate to, if they want to stay at the top, be fast and smart enough to provide the software, the method, and the means to enable customers to gain technology and business improvements and the value that results from these changes.

And while it seems Teradata has taken its time for this transformation it is also evident that the company is taking it very seriously. Will this be enough to keep pace with peer vendors within a very active, competitive, and transformational market? Well, it’s hard to say, but certainly with a number of defined steps, Teradata looks like it will be able to meet its goal of remaining a key player in the data management and analytics industry.

Here we take an up-to-date look at Teradata’s business and technology strategy, including its flexible approach to deployment and ability for consistent and coherent analytics over all types of deployment, platforms, and sources of data; and then explore what the changes mean for the company and its current and future customers.

The Sentient Enterprise
As explained in detail in a previous installment, Teradata has developed a new approach towards the adoption of analytics, called the “sentient enterprise.” This approach aims to guide companies to:

improve their data agility
adopt a behavioral data platform
adopt an analytical application platform
adopt an autonomous decision platform

While we won’t give a full explanation of the model here (see the video below or my recent article on Teradata for a fuller description of the approach), there is no doubt that this is a crucial pillar for Teradata’s transformational process, as it forms the backbone of Teradata‘s approach to analytics and data management.

Teradata Video: The Sentient Enterprise

As mentioned in the previous post, one aspect of the “sentient enterprise” approach from Teradata that I particularly like is the “methodology before technology” aspect, which focuses on scoping the business problem, then selecting the right analytics methodology, and at the end choosing the right tools and technology (including tools such as automatic creation models and scoring datasets).

Teradata Everywhere
Another core element of the new Teradata approach consists of spreading its database offering wide, i.e., making it available everywhere, especially in the cloud. This movement involves putting Teradata’s powerful analytics to work. Teradata Database will now be available in different delivery modes and via different providers, including on:

Amazon Web Services—Teradata Database will be available for a massively parallel process (MPP) configuration and scalable for up to 32 nodes, including services such as node failure recovery and backup, as well as restoring and querying data in Amazon’s Simple Storage Service (S3). The system will be available in more than ten geographic regions.
Microsoft’s Azure—Teradata Database is expected to be available by Q4 of 2016 in the Microsoft Azure Marketplace. It will be offered with MPP (massively parallel processing) features and scalability for up to 32 nodes.
VMWare——via the Teradata Virtual Machine Edition (TVME), users have the option for deploying a virtual machine edition of Teradata Database for virtual environments and infrastructures.
Teradata Database as a Service—Extended availability for the Teradata Database will be available to customers in Europe through a data center hosted in Germany.

Teradata’s own on-premises IntelliFlex platform.

Availability of Teradata Database on different platforms

Borderless Analytics and Hybrid Clouds
The third element in the new Teradata Database picture involves a comprehensive provision of analytics despite the delivery mode chosen, an offering which fits the reality of many organizations—a hybrid environment consisting of both on-premises and cloud offerings.

With a strategy called Borderless Analytics, Teradata allows customers to deploy comprehensive analytics solutions within a single analytics framework. Enabled by Teradata’s solutions such as its multi-source SQL and processing QueryGrid engine and Unity, its orchestration engine for Teradata’s multi-system’s environments, this strategy purposes a way to perform consistent and coherent analytics over heterogeneous platforms with multiple systems and sources of data, i.e., in the cloud, on-premises, or virtual environments.

At the same time, this is also serving Teradata as a way to set the basis for its larger strategy for addressing the Internet of Things (IoT) market. Teradata is addressing this goal with the release of a set of new offerings called Analytics of Things Accelerators (AoTAs), comprised by technology-agnostic intellectual property that emerged as a result of Teradata’s real life IoT project engagements.

These accelerators can help organizations determine which IoT analytical techniques and sensors to use and trust. Due to the AoTAs’ enterprise readiness and design, companies can deploy them without having an enterprise scaling approach in mind, and not have to go through time-consuming experimentation phases before deployment to ensure the right analytical techniques have been used. Teradata’s AoTAs accelerate adoption, enabling deployment cost reduction and ensuring reliability. This is a noteworthy effort to provide IoT projects with an effective enterprise analytics approach.

What Does this Mean for Current and Potential Teradata Customers?
Teradata seems to have a concrete, practical, and well-thought-out strategy regarding the delivery of new generation solutions for analytics, focusing on giving omnipresence, agility, and versatility to its analytics offerings, and providing less product dependency and more business focus to its product stack.

But one thing Teradata needs to consider, given the increasing number of solutions available from its portfolio, is being sure to provide clarity and efficiency to customers regarding which solution blend to choose. This is especially true when the solution choice involves increasingly sophisticated big data solutions, a market that is getting “top notch” but certainly is still difficult to navigate into, especially for those new to big data.

Teradata’s relatively new leadership team seems to have sensed right away that the company is currently in a very crucial position not only within itself but also within the industry of providing insights. If its strategy works, Teradata might be able to not only maintain its dominance in this arena but also increase its footprint in an industry destined to expand with the advent of the Internet of Things.

For Teradata’s existing customer base, these moves could be encouraging, as they could mean being able to expand the company’s existing analytics platforms using a single platform and therefore without any friction and with and cost savings.

For those considering Teradata as a new option, it means having even more options for deploying end-to-end data management solutions using a single vendor rather than a having a “best of breed” approach. Either way though, Teradata is pushing towards the future with a new and comprehensive approach to data management and analytics in an effort to remain a key player in this fierce market.

The question is if Teradata’s strategic moves will resonate effectively within the enterprise market to compete with the existing software monsters such as Oracle, Microsoft, and SAP.

Are you a Teradata user? If so, let us know what you think in the comments section below.

(Originally published on TEC's Blog)

Salesforce Acquires BeyondCore to Enable Analytics . . . and More

Jorge Garcia9:46:00 AMadvanced analytics, analytics, beyond core, business analytics, business intelligence, cloud, cloud computing, salesforce

In October of 2014, Salesforce announced the launch of Salesforce Wave, the cloud-based company’s analytics cloud platform. By that time, Salesforce had already realized that to be able to compete with the powerful incumbents in the business software arena—the Oracles, SAPs and IBMs of the world—arriving to the cloud at full swing would require it to expand its offerings to the business intelligence (BI) and analytics market and to the enterprise mobile app market. Safesforce has realized this goal with Wave Analytics and Salesforce’s analytics app for Sales Cloud customers Wave App.

Since then, there have been many developments in the analytics field and especially in the areas of advanced analytics, where most major software vendors have expanded their analytics offerings to the cloud. These events signal Salesforce’s entry into the never-ending race for the ultimate intelligence solution—or not?

From Salesforce’s Perspective

Cloud-based analytics solution provider BeyondCore had the upper hand over similar vendors, and in August 2016 was acquired by Salesforce. The company fits most, if not all, the necessary requirements for fast integration within Salesforce platform.
But why the acquisition?, I asked myself as soon as I heard the news.

I mean Salesforce had already launched its analytics and mobile offerings, including major functional expansions and improvements in 2015 to improve dashboard capabilities and provide better integration with the existing platform as well as with a number of specific partner apps—e.g., Apttus Quote-to-cash Intelligence, FinancialForce ERP Web Apps, and Vlocity Communications Cloud Analytics.

So what was so appealing about BeyondCore that encouraged Salesforce to decide upon this acquisition? First, according to a post written by Arijit Sengupta, still the CEO of BeyondCore, it was during Gartner’s BI Summit that BeyondCore grabbed Salesforce’s attention while showing the neat integration capabilities of BeyondCore’s next version with the cloud software giant’s platform. For Salesforce, a particular point of interest is of course getting a hold of a product that has already a tight set of integration capabilities, which can mitigate user resistance to adoption.

Second, Salesforce knows that the analytics and BI realm is continually expanding. By the time the vendor got its hands on its brand new analytics platform, “disruptive” (boy do I love this term!) and other big vendors were already getting their hands dirty with new trending technologies entering the market: machine learning, enhanced pattern recognition algorithms, improved guided discovery techniques, and even cognitive and artificial intelligence functionality. Salesforce quickly realized that to remain competitive in this ever-changing space, it would need to expand its existing analytics platform to keep the pace with the rest.

From the User’s Perspective
The third reason for the acquisition is that BeyondCore’s offering is one to watch closely from the customer’s perspective, if integrated nicely within Salesforce’s platform. With proprietary smart pattern discovery technology, the start-up analytics company aims to ease the often laborious and error-prone process involved in finding useful insights. Specifically, it provides a way to exhaustively examine variable combinations and patterns within large datasets, and automate the statistical analysis process with ready-to-use patented models for defining mathematical relationships within the data.

BeyondCore’s analytics solution provides users with guided exploration and discovery features, along with recommendations to provide help within the complete analytics cycle, such as suggestions for action. A set of collaboration functionality features is available for sharing results and promoting analytics within a collaborative interface, rather than as an isolated data science bunker.

BeyondCore includes a broad set of advanced analytics features (figure 1) that serves to build the case for Salesforce´s interest and user appeal. The possibility of having an advanced analytics toolset within the Salesfoce platform will appeal to both users and Salesforce itself, which will enable the vendor to not only keep pace with industry leaders but also compete head to head with other players in the market.

Figure 1. Major functional features in comparison with other offerings (Source: https://beyondcore.com/product/)

Expecting a Prompt Integration of BeyondCore with Salesforce
While the acquisition has already been completed, not much information has been shared on Salesforce’s internal strategy for fusing BeyondCore within the Salesforce platform.
I would expect Salesforce to promptly fuse all BeyondCore assets to fit nicely with the platform. A couple of these integrations appear to be worth exploring:

Integration with the Salesforce Wave Analytics and Wave App portions of the cloud platform in order to naturally expand the analytics offering
Integration with Salesforce’s new IoT Cloud Service, which would enable Salesforce to offer more robust and modern data management solutions that provide a wide variety of connected services and their analysis

In my view, the BeyondCore acquisition offers Salesforce more than just technology. Whether Salesforce succeeds in successfully integrating the BeyondCore solution will depend on how well the solution interacts with the rest of the Salesforce’s platform from a practical and business point of view.

So what?
Finally, I would expect to see Salesforce keep investing fervently in the analytics and space. Evidence of this is in Salesforce’s recent revelation, by CEO Marc Benioff, of the company’s continued investment in the realm of artificial intelligence (AI) and cognitive computing with its ‘Einstein’ project/product, Salesforce’s project to introduce AI to its customer relationship management (CRM) product aside from SalesforceIQ.

Certainly, for Salesforce and other vendors in this market, the road ahead will see many changes as, surprisingly, there’s still a lot of consolidation to be done in many areas of the analytics space. This is particularly true for creating a smooth path from analysis to decision making as well as for filling the gap between traditional analytics infrastructures and those emerging from new technologies—to enable faster, better, and more accurate decisions based on current, available data.
(Otiginally published in TEC's Blog)

IT Sapiens, for Those Who Are Not

Jorge Garcia9:30:00 AManalytics, business analytics, business intelligence, crm, itsapiens, reporting, reporting tools, sugarcrm, vtiger

Perhaps one of the most refreshing moments in my analyst life is when I get the chance to witness the emergence of new tech companies—innovating and helping small and big organizations alike to solve their problems with data.
This is exactly the case with Latvia-based IT Sapiens, an up-and-coming company focused on helping those small or budget-minded companies to solve their basic yet crucial data reporting issues.

Fact: Not all companies Are IT Sapiens
Quite often we let ourselves go and fall for all hype technologies and those companies adopting them: Big Data, large analytics deployments, and such others. We often forget that a large number of companies are still facing much more basic problems such as basic reporting and analytics, thinking this is only a thing of the past.
Well, not too long ago I had the opportunity to speak with IT Sapiens, a start-up company that is still working on the provision of day-to-day reporting and analytics solutions for a large number of small and emerging companies that need practical and easy yet effective analytics solutions, and of course at a more accessible price.
Led by young entrepreneur and analytics expert Eva Narunovska as its chief executive officer (CEO), IT Sapiens has developed solutions for a—to my surprise, I have to admit—large community of open-source customer relationship management (CRM) solutions SugarCRM and Vtiger.
According to Eva’s and her team’s experience, many of these organizations keep struggling with the inherent complications of performing advanced reporting and analysis with these two open-source CRM solutions, not to mention the need for a reporting solution with an optimal performance and overall neat integration with them.
Based on this need, IT Sapiens has developed a solution based on a proprietary SQL engine technology that allows organizations that are not tech savvy to deploy reporting and analytics solutions that work on top of both SugarCRM and Vtiger, enabling them to enhance their analytics and reporting capabilities with a tightly integrated solution.

IT Sapiens: Simplicity, flexibility, and neat integration
The IT Sapiens offering spans over three main software products:

The Analytics reporting for SugarCRM
A solution that enables the authoring of dynamic reports and interactive charts based on SugarCRM’s data. The reporting tool includes features to modify reports and charts, sort and group a collection of predefined reports to ease the development process, as well as add an unlimited number of groups.
The tool includes a good-quality, flexible chart and map library (see figure) for developing versatile charts and dashboards.

Figure. SugarCRM Dashboard with IT Sapiens Charts and Maps

The following a brief demo of IT Sapiens’ reporting for SugarCRM capabilities.

An Advanced reporting for VtigerCRM
A solution to enable advanced reporting and analytics integrated with VtigerCRM. The tool integrates with VtigerCRM and allows users to configure reports and charts as well modify and adjust filters, sorting, and grouping criteria.
According to IT Sapiens, the module can be quickly and easily installed by using Vtiger’s Module Manager to enable the reporting tool to be ready in a matter of minutes. As with the version for SugarCRM, the reporting tools come with predefined reports and functionality to let users start using the reporting and analysis tool to, for example, compare sales results and identify sales channels quickly. Some of this functionality can be seen in the following video.

IT Sapiens also offers an Advanced calendar for VtigerCRM, which can be configured and customized to expands Vtiger’s native capabilities, such as quick registry options and easy event search and filtering and integration with Google Calendar.

IT Sapiens: Key takeaway
Producing practical and not pretentious solutions, IT Sapiens fits well within the small and medium (SMB) market. The company targets those organizations in need for analytics and reporting solutions with minimal requirements for information technology (IT) footprints and, most of the time, kind of left behind by the bigger software vendors. IT Sapiens can run both on premise via a web-based solution that can be deployed on Windows and Linux, and in the cloud for SugarCRM users via Sugar’s cloud partner program with an accessible subscription-based program.
One interesting fact about this small tech company is that its growing customer footprint of more than 130 customers spans globally, especially across Europe in countries such UK and Germany, where a good number of businesses using open-source software from Sugar and Vtiger seem to find in IT Sapiens a good fit for their reporting and analytics needs.
On the recommendation side, I like the simplicity and neatness of IT Sapiens’ offerings and its integration aspect. It would be nice to see whether IT Sapiens expand its offering to other markets, not necessarily open source. I wouldn’t be surprised if organizations using commercial CRM systems would be interested in having their hands on an analytics and reporting solution that is easy to integrate, deploy, and exploit, with a simplicity and efficiency that can go a long way.
Again, as an analyst, I always find it gratifying to be able to report on what is going about promising tech companies that are emerging locally and abroad, and making a difference by helping customers solve their day-to-day data problems or disrupt the market with new offerings and technologies.

Influencer Summit 2016—Teradata Reshapes Itself with Analytics and the Cloud

Jorge Garcia11:10:00 AManalytics, big data, business intelligence, cloud, cloud computing, data management, data warehousing, teradata

For anyone with even a small amount of understanding regarding current trends in the software industry it will come as no surprise that the great majority of enterprise software companies are focusing on the incorporation of analytics, big data, cloud adoption, and especially the Internet of Things into their software solutions.
In fact, these capabilities have become so ubiquitous that for users and customers the focus is not so much the initial adoption of these technologies by software providers but more how this incorporation is being made and the specific benefits it could have for them.
A couple of weeks ago, I had the chance to attend Teradata’s recent Influencer Summit in San Diego, California, where customer success stories and product updates were provided. The event was particularly interesting as it outlined some of the next steps to be taken by Teradata and how Teradata is clearly reshaping itself towards a future with a very different software ecosystem and business model.

A New Teradata Being Shaped by Analytics and Big Data
Since the two most recent influencer summits, it has been possible to perceive a clear movement towards Teradata’s internal transformation, as a result of:

the support and adoption of Presto, the distributed SQL engine for big data, as well as the launch of Listener, Teradata’s self-service solution for ingesting and distributing fast-moving data streams;
key acquisitions, such as cloud-based analytics company Hadapt, information management company Revelytix, and big data consulting firm Think Big; and
strong partnerships with big data powerhouses Cloudera, Hortonworks, and MapR.

During his presentation, Teradata’s executive vice president and chief business officer John Dinning made it clear that Teradata is evolving all of its product stack towards a central idea: offering organizations solutions for becoming what Teradata calls a “sentient enterprise,” an organization with a comprehensive view of, practice of, and use of analytics within an organization.
With this view in mind, Teradata is working on a number of product reconfigurations and changes to provide users with a much updated and adapted set of solutions to, in the words of Teradata:

achieve repeatable and operationalized analytics, and
achieve execution at scale, all
at a lower total cost of ownership (TCO).

Towards this end, Teradata is focusing on establishing key industry differentiators to provide multi-genre analytics and ensure modern data integration and enhanced customer experience at scale (figure 1). Some of the elements Teradata considers to necessary to achieve these goals include:

Figure 1. Teradata’s new strategy and view (courtesy of Teradata)

ThinkBig: Proving to Be an Effective Acquisition
Regarding customer success and consulting services, at the conference Teradata described more specifically the reasoning behind its new strategy, and emphasized customer experience from both the support and consulting services perspectives.
Key results can be seen from Teradata’s acquisition of ThinkBig, which has enabled the company to reinforce services tailored to different user groups within an organization and the offering of a proven agile methodology called Velocity, which should expand the reach of Teradata’s services to new big data frameworks and architectures (figure 2).
Through these steps, Teradata aims to close many of the existing time/development/deployment gaps companies have in their existing data management frameworks. Teradata wants to provide not just the solutions but a set of consulting and methodology services purposefully aligned to accelerate the development cycle of any new analytics solution, especially big data-oriented solutions.

Figure 2. Teradata’s ThinkBig Velocity agile development model (courtesy of Teradata)

Teradata’s Take on the Future of Analytics and BI: The Sentient Enterprise
Perhaps one of the most important aspects of the path the evolution of analytics has taken is its expansion out of the realm of business intelligence (BI). While companies still recognize the need and value that these “traditional” BI solutions bring, many of them know that there are still gaps to close and that many things can be done to improve and apply best practices in the use of analytics. One is to recognize that analytics can be used in almost all areas of the organization, and another is the expanded use and interrelation of analytics across the organization.
Teradata understands this as a new core requirement for many of its customers. As such, it has developed a strategy whereby it can offer a comprehensive approach to the practice and use of analytics within a data management strategy to close those gaps. Teradata offers a comprehensive framework for this in a concept Teradata calls the sentient enterprise.
Merriam-Webster defines sentient as being “responsive to or conscious of sense impressions, being aware, or being finely sensitive in perception or feeling.”
From the definition above, we can infer that Teradata’s aim is to align its product stack with the goal of providing comprehensive analytics solutions that allow its customers to be aware of and responsive to changing needs. Teradata’s sentient enterprise concept means deploying solutions that function under 5 pillars:

Agile deployment
A behavioral data platform
A collaborative data platform
An analytical application platform
An autonomous decision platform

This strategy relies on the provision of agility to reduce design and deployment times, and also involves a shift to:

a hybrid cloud approach—this approach aims to allow customers to configure the best-fit cloud/on-premises configuration for their unique situation (instead of forcing customers to migrate to the cloud)
a comprehensive data management approach via Teradata’s unified data architecture (UDA) that allows customers to deploy platforms using a diversity of solutions, both Teradata and non-Teradata (figure 2)
expanding the number of supported analytics solutions via Teradata’s in-house analytics offerings and those from its wide number of partners

Another noteworthy aspect of the sentient enterprise approach from Teradata that I particularly liked is the “methodology before technology” aspect, which focuses on scoping the business problem, then selecting the right analytic methodology, and at the end choosing the right tooling and methodology (including tools such as automatic creation models and scoring datasets).

Figure 3. Teradata unified data architecture (UDA), courtesy of Teradata

Teradata: Late but Solid Strategy for the Cloud
Despite a slower start compared with other software providers, Teradata is now taking bolder steps to move to the cloud by bringing some of their core offerings to the cloud.
Being a company that predominantly provides data management products and services for large enterprises explains Teradata’s slower yet more thoughtful cloud adoption process. Teradata’s customers likely needed to make sure Teradata would be able to provide reliable, secure, and high-quality cloud services before making the switch. It is also interesting to note (from conversations with Teradata executives), that Teradata consciously decided—with a bit of pressure from some customers—to take the necessary time to plan, build, and execute an effective cloud strategy.
The previous information can also explain why Teradata is taking a cloud approach based on the provision of hybrid cloud services, recognizing that a big portion of its market still relies on having services both on-premises and in the cloud. It also speaks to Teradata’s decision to give its ecosystem time to transition more smoothly to the cloud.
The hybrid cloud approach from Teradata also considers the idea of enabling organizations to keep using or building heterogeneous cloud ecosystems composed of different products from different vendors to ensure proper integration between all of them. Teradata aims to provide flexibility for cloud adoption.

Figure 4. Teradata Hybrid Cloud, courtesy of Teradata

Teradata’s cloud strategy includes offering different services via on-premises, managed cloud, and public cloud to bring options, flexibility, and adaptability to those customers with specific cloud/on-premises requirements. For example, a company can go with the Teradata Database in AWS, Teradata Analytics in the Cloud, or the Teradata Integrated Data Warehouse offered for on-premises environments.

Finding the Balance between Agility and Reliability
Without a doubt, one of the fundamental paradigms every company needs to solve today is how to lead the way and innovate with technology, without losing a grip on the provision of real value and customer satisfaction for the customers they serve.
As the saying goes, “experience is the father of wisdom.” Teradata knows based on its longtime experience that the market it moves in is not an easy one, and, especially today, it needs to move fast into the future to keep up with its customers’ needs. Still, though, Teradata will need to find the proper speed to provide its customers with those products and services that enable state-of-the-art analytics and data management services but still provide the robust enterprise readiness background the majority of its enterprise customers require.
Only time will tell if Teradata’s strategy for the future will work successfully, but the company has a long tradition of making things work in the enterprise world.

(Originally published in TEC's Blog)

Zyme: Emergence and Evolution of Channel Data Management Software

Jorge Garcia9:01:00 AManalytics, business analytics, business intelligence, CDM, channel data management, data, data management, zyme

Previous to the official launch of the new version of Zyme’s solution, I had the opportunity to chat and be briefed by Ashish Shete, VP of Products and Engineering at Zyme, in regard to version 3.0 of what Zyme describes as its channel data management (CDM) solution platform.
This conversation was noteworthy from both the software product and industry perspectives. In particular, the solution is relevant to an industry that needs software and technology solutions to help control, streamline, and improve the management of a fascinating and complex ecosystem called the distribution channel.
Zyme aims to increase the efficiency of this ecosystem through its CDM platform.

The distribution channel: a hidden monster
According to the United Nations Conference on Trade and Development (UNCTAD):

Driven by favorable policies, technological innovation and business models bringing down the costs of cross-border transactions, international trade in goods and services added about 20 trillion US$ during the last 25 years, going from about 4 trillion US$ in 1990 to about 24 trillion US$ in 2014.

Global business is now “business as usual,” as it is the norm for a global economy. As manufacturers and service providers put goods on the market that are worth trillions of dollars, a huge infrastructure of distributors, resellers, retailers, and value-added resellers (VARs)—what we call the channel—is responsible for selling and moving them the globe.
As more goods and services reach new markets and new trade and commercialization models are created, the channel becomes an increasingly complex ecosystem that moves an immense flow of goods from many different places (see Figure 1 below).

Figure 1. A simple version of the channel (Image courtesy of Zyme)

As a result, manufacturers and service providers are experiencing challenges in handling the increasing volume and diversity of data coming from the channel and still be able to maintain visibility into the channel as well as garner insight on when and how their products and services are being sold and moved within the channel.

Simply managing this data is typically a complex and cumbersome task. This is because the data collected from the channel originates from different sources, and comes in different formats (text files, spreadsheets, via Open Database Connectivity (ODBC) connectors to third-party systems, etc.) and diverse structures (plain text, XML files, etc.). The challenge then is to find the most efficient way to collect, clean, organize, and consolidate this variety of data in order to gain visibility and insight from all these data points.
Companies like Zyme offer CDM software solutions as a concrete means to address this challenge. But what is a CDM solution? Well, in the words of Zyme, CDM is:
a discipline concerned with the acquisition and use of data originating from the channel. It enables companies to significantly grow their business by offering transformative insights into the way business is conducted in the channel.
In other words, a CDM solution offers a series of tools that enable customers or users to efficiently manage the data coming from the channel. This includes the following:

Integration with third-party systems
Automated data collection
Data enrichment functionality
Advanced analytics and reporting capabilities

Zyme aims to achieve complete channel visibility through its cloud CDM platform, which collects the raw data originating from partners, and pushes it to the proprietary technologies and content libraries, which then transform it into usable data for intelligence gathering. Once the data is ready, it can be processed and consumed for analysis and visualization through dashboards and/or other specific third-party analytics systems.

The channel data management market has enormous potential for growth and evolution. And a company such as Zyme, with its combination of expertise and innovative technology, keeps constantly developing this segment of the data management market.

Proof of this is the consistent growth of Zyme―which accounts for more than 70% of market share. The company expects to process more than $175 billion in channel revenues and more than 1 billion transactions this year thanks to its set of big customers, which includes Microsoft, VMWare, and GE, just to name a few of the players.

Figure 2. Zyme Screencap (Courtesy of Zyme)

Zyme adds power with version 3.0
On June 30th, Zyme announced the release of version 3.0 of its CDM solution. This news keeps with its mission to expand the platform and provide channel visibility to global enterprises. Zyme’s new version has been enriched with several improvements, three of which are core to the new direction of the company:

The addition of zymeEcommerceSM to the platform. This new e-commerce offering will give companies more visibility into online shelf space. This new solution can keep track of metrics such as competitors’ product positioning, pricing, and customer perception across e-commerce channels—and consequently delivers market intelligence.
The addition of the new zymeIncentives solution. This solution allows companies to perform incentives management, and thus automatically calculate and validate rebates and credits earned by partners based on Zyme’s existing decision-grade data. The solution is also able to communicate as well as facilitate incentives payments to channel partners quickly and seamlessly.
Zyme’s approach to the Internet of Things (IoT) called zymeCDMSM. This enhances Zyme’s existing functionality with capabilities for tracking connected devices down to individual serial numbers in real time. This in turn improves visibility into product movement, such as mapping out a product’s complete route to a customer for a manufacturer, with the ultimate goal of closing the loop between manufacturers and end users.

In regard to its new version, Chandran Sankaran, Zyme’s CEO mentioned:

The Zyme cloud platform 3.0 makes our proprietary technologies and comprehensive content libraries, including more than 1.5 million channel partners and the largest directory of products and retailers, available to customers through a modern, scalable, SaaS platform. Global enterprises have immediate access to complete, accurate and timely data from resellers and distributors to unlock the enormous value that had previously been trapped in the channel due to inefficient and outdated reporting systems and processes.

On the other hand, on the customer side, Kevin Nusky, Director of Marketing and Sales Operations at Schneider Electric’s IT Business Unit had the following to say about Zyme’s new release:

More than 65 percent of our sales go through a distribution system, so we can't make informed business decisions without accurate data from channel partners. Zyme delivers unprecedented partner reporting accuracy, which has led to improved inventory management, reduced rebate overpayments, increased revenue through better partner development and accelerated channel growth and success.

Building on its core mission to deliver channel visibility to global enterprises, the company offers a targeted solution that provides complete channel visibility. Zyme’s cloud platform 3.0 aims to empower companies to obtain the maximum value from the channel sales.

Zyme in a blue sea
It appears that Zyme has encountered in channel data management a market with huge potential where competitors appear to be scarce and users willing to consider these new types of software offerings. In this market, the IoT could empower companies like Zyme with the tools for improving the mechanisms driving complete channel visibility for its customers.
As with many other types of enterprise software applications, Zyme’s success will depend on how efficiently it can integrate with the existing software stack (customer relationship management [CRM], enterprise resource planning [ERP], and other systems) in order to ensure data management agility and timeliness, as well as accurate visibility and natural interactivity with other business operations. Zyme appears to be on a right path to achieving these goals.

An Interview with Dataiku's CEO: Florian Douetteau

Jorge Garcia1:36:00 AManalytics, bigdata, data science, dataiku

As an increasing number of organizations look for ways to take their analytics platforms to higher grounds, many of them are seriously considering the incorporation of new advanced analytics disciplines, this includes hiring data science specialists and solutions that can enable the delivery of improved data analysis and insights. As a consequence, this also triggers the emergence of new companies and offerings in this area.

Dataiku is one of these new breed of companies. With its Data Science Studio (DSS) solution, Dataiku aims to offer full data science solution for both data science experienced and non-experienced users.

In this opportunity I had the chance to interview Florian Douetteau, Dataiku’s CEO and be able to pick some of his thoughts and interesting views in regards to the data management industry and of course he’s company and software solution.

A brief Bio of Florian

In 2000, at age 20, he dropped the prestigious “Ecole Normale Supérieure” math courses and decided to look for the largest dataset he could find, and the hardest related problem he could solve.

That’s how he started working at Exalead, a search engine company that back at the time was developing technologies in web mining, search, natural language processing (NLP) and distributed computing. At Exalead, Florian scaled to be managing VP of Product and R&D. He stayed in the company until it was acquired in 2010 by Dassault Systèmes for $150M (a pretty large amount for French standards).

Still in 2010 when the data deluge was pouring into to new seas, Florian worked in the social gaming and online advertising industry, an industry where machine learning was already being applied on petabytes of data. Between 2010 and 2013 he held several positions as consultant and CTO.

By 2013 Florian along with other 3 co-founders creates Dataiku with the goal of making advanced data technologies accessible to companies that are not digital giants, since then one of Florian’s main goals as CEO of Dataiku is to be able of democratizing access to Data Science.

So, you can watch the video or listen to the podcast in which Florian shares with us some of his views on the fast evolution of data science, analytics, big data and of course, his data science software solution.

Of course, please feel free to let us know your comments and questions.

Altiscale Delivers Improved Insight and Hindsight to Its Data Cloud Portfolio

Jorge Garcia10:31:00 AMaltiscale, altiscale data platform, altiscale insight cloud, analytics, big data, bigdata, business analytics, cloud, hadoop, Raymie Stata, tableau, yarn

Logo courtesy of Altiscale

Let me just say right off the bat that I consider Altiscale to be a really nice alternative for the provisioning of Big Data services such as Hortonworks, Cloudera or MapR. The Palo Alto, California–based company offers a full Big Data platform based in the cloud via the Altiscale Data Cloud offering. In my view, Altiscale has dramatically increased the appeal of its portfolio with the launch of the Altiscale Insight Cloud and a partnership with Tableau, which will bring enhanced versatility and power to Altiscale’s set of services for Big Data.

The new Altiscale Insight Cloud

On March 15th, Altiscale released its new Altiscale Insight Cloud solution. In the words of Altiscale, this is a “self-service analytics solution for Big Data.” Altiscale Insight Cloud aims to equip business analysts and information workers with the necessary tools for querying, analyzing, and getting answers from Big Data repositories using the tools that they are familiar with, such as Microsoft Excel and Tableau.

According to the California-based company, with this new offering, Altiscale will be able to provide its customers with a robust self-service tool and an accessible and easy-to-query data lake infrastructure. As such, companies will be able to avoid many of the complexities involved in the complex and difficult preparation process of providing users with easy and fast access to Big Data sources.

To achieve simplicity and agility, Altiscale relies on having a converged architecture, so that on the one hand it can minimize the need for data movement and replication, especially across Big Data sources, and on the other hand, it can eliminate the need for separate relational data stores in order to reduce organizational costs and management efforts.

According to Raymie Stata, chief executive officer (CEO) and founder of Altiscale, the Insight Cloud:

Solves the challenge of bringing Big Data to a broader range of users, so that enterprises can quickly develop new offerings, better target customers, and respond to shifting market or operational conditions. It’s a faster and easier way to get from Big Data infrastructure to insights that drive real business value.

Altiscale considers that its Insight Cloud will be able to replace many more complex and expensive alternatives, allowing organizations to get their hands on Big Data broadly and quickly, without heavy information technology (IT) involvement. As such, Altiscale Insight Cloud will have a significant impact on the speed and facility with which organizations will be able to access and analyze Big Data sources.

As a high-performance, self-service analytics solution, some of the core features of the Altiscale Insight Cloud include:

interactive Structured Query Language (SQL) queries,
dynamic visualizations,
real-time dashboards, and
other reporting and analytics capabilities.

The big news is that with its Insight Cloud offering, Altiscale will be delivering not only a reliable Big Data platform, but also an extension to its infrastructure that can simplify the connection between Big Data and the end user, which is currently a complex, slow, and expensive process for many organizations. This can also significantly reduce the need for expensive, proprietary solutions—not to mention that this new offering can avail many business analysts easier and faster access to an organization’s existing Hadoop data lake.

Of course, organizations interested in this offering will need to consider a number of things including Altiscale’s power to perform data preparation and cleaning to ensure high-quality data and profiling. But without a doubt, this is a wise step from Altiscale: to provide its customers with the next logical step in the Big Data infrastructure, which is the ability to perform fast and efficient analysis.

Altiscale and Tableau: Business intelligent partnership?

Within a few short weeks of the Altiscale Insight Cloud launch, Altiscale announced a partnership with data discovery and visualization powerhouse Tableau. The partnership with Tableau will, according to both vendors:

make it easier for business analysts, IT professionals, and data scientists to access, analyze, and visualize the massive volumes of data available in Hadoop.

Additionally, according to Dan Kogan, director of product marketing at Tableau:

Altiscale shares our mission to help people see and understand their data. Partnerships with leading Hadoop and Spark providers such as Altiscale help us to bring rich visual analytics to anyone within the enterprise looking to derive value from data.

Now users can use Tableau connected to the Altiscale Insight Cloud directly via Open Database Connectivity (ODBC), the standard application programming interface (API) for accessing database management systems (DBMSs). Once connected, Altiscale Insight Cloud will enable users to create visualizations and perform analysis similarly to working with other databases.

User will be able to use Tableau’s easy features to drag and drop fields, filter data, analyze data, and derive insights to create visualizations that can later be published to Tableau Server. Additionally, there is a noteworthy feature that allows users to reuse intermediate solutions provided by Altiscale partners, so that users can first aggregate and catalog data prior to creating visualizations with Tableau, thus providing extra flexibility and power to the Altiscale-Tableau connection.

Of course, the first thing that stands out from this partnership is the opportunity for thousands of users on both ends of the partnership and from different disciplines to, on the one hand, be able to use an appealing and easy-to-use tool such as Tableau, and on the other hand, to easily crack the data coming from large and complex data repository residing in Hadoop.

This partnership shows how Big Data and analytics and business intelligence (BI) providers are moving in an industry-wise manner to increasingly narrow the functional gaps between Big Data sources and their availability for analysis, while widening the number of options for incorporating Big Data within enterprise analytics strategies.

While such a partnership is not at all surprising, it is relevant to the continuous evolution and maturity of new enterprise BI and analytics platforms.

But what do you think? Of course, I look forward to hearing your comments and suggestions. Drop me a line, and I’ll respond as soon as possible.

Hortonworks’s New Vision for Connected Data Platforms

Jorge Garcia2:13:00 PMapache, bigdata, data lake, data management, data science, hadoop, hortonworks

Courtesy of Hortonworks

On March 1, I had the opportunity to attend this year’s Hortonworks Analyst Summit in San Francisco, where Hortonworks announced several product enhancements and new versions and a new definition for its strategy going forward.

Hortonworks seems to be making a serious attempt to take over the data management space, while maintaining a commitment to open sources and especially to the Apache Foundation. Thus as Hortonworks keeps gaining momentum, it’s also consolidating its corporate strategy and bringing a new balance to its message (combining both technology and business).

By reinforcing alliances, and at the same time moving further towards the business mainstream with a more concise messaging around enterprise readiness, Hortonworks is declaring itself ready to win the battle for the big data management space.

The big question is if the company’s strategy will be effective enough to succeed at this goal, especially in a market already overpopulated and fiercely defended by big software providers.

Digesting Hortonworks’s Announcements
The announcements at the Hortonworks Analyst Summit included news on both the product and partner fronts. With regards to products, Hortonworks announced new versions for both its Hadoop Data (HDP) and Hadoop Dataflow (HDF) platforms.

HDP—New Release, New Cycle
Alongside specific features to improve performance and reinforce ease of use, the latest release of Apache HDP 2.4 (figure 1) includes the latest generation of Apache’s large-scale data processing framework, Spark 1.6, along with Ambari 2.2, Apache’s project for making Hadoop management easier and more efficient.

The inclusion of Ambari seems to be an important key for the provision of a solid, centric management and monitoring tool for Hadoop clusters.

Figure 1. Hortonworks emphaszes enterprise readiness for its HDP version

(Image courtesy of Hortonworks)

Another key announcement with regard to HDP is the revelation of a new release cycle for HDP. Interestingly, it aims to provide users with a consistent product featuring core stability. The new cycle will enable, via yearly releases, HDP services such as HDFS, YARN, and MapReduce as well as Apache Zookeeper to align with a compatible version of Apache Hadoop with the “ODPi Core,” currently in version 2.7.1. These can provide standardization and ensure a stable software base for mission critical workloads.

On the flip side, those extended services that run on top of the Hadoop core, including Spark, Hive, HBase, Ambari and others will be continually released throughout the year to ensure these projects are continuously updated.

Last but not least, HDP’s new version also comes with the new Smartsense 1.2, Hortonworks’s issue resolution application, featuring automatic scheduling and uploading, as well as over 250 new recommendations and guidelines.

Growing NiFi to an Enterprise Level
Along with HDP, Hortonworks also announced version 1.2 of HDF, Hortonworks’s offering for managing data in motion by collecting, manipulating, and curating data in real time. The new version includes new streaming analytics capabilities for Apache NiFi, which powers HDF at its core, and support for Apache Storm and Apache Kafka (figure 2).

Another noteworthy feature coming to HDF is its support for integration with Kerberos, a feature which will enable and ease management of centralized authentication across the platform and other applications. According to Hortonworks, HDF 1.2 will be available to customers in Q1 of 2016.

Figure 2. Improved security and control added to Hortonworks new HDF version

(Image courtesy of Hortonworks)

Hortonworks Adds New Partners to its List
The third announcement from Hortonworks at the conference was a partnership with Hewlett Packard Labs, the central research organization of Hewlett Packard Enterprise (HPE).

The collaboration mainly has to do with a bipartisan effort to enhance performance and capabilities of Apache Spark. According to Hortonworks and HPE, this collaboration will be mainly focused on the development and analysis of a new class of analytic workloads which benefit from using large pools of shared memory.

Says Scott Gnau, Hortonworks’s chief technology officer, with regard to the collaboration agreement:

This collaboration indicates our mutual support of and commitment to the growing Spark community and its solutions. We will continue to focus on the integration of Spark into broad data architectures supported by Apache YARN as well as enhancements for performance and functionality and better access points for applications like Apache Zeppelin.

According to both companies, this collaboration has already generated interesting results which include more efficient memory usage and increased performance as well as faster sorting and in-memory computations for improving Spark’s performance.

The result of these collaborations will be derived as new technology contributions for the Apache Spark community, and thus carry beneficial impacts for this important piece of the Apache Hadoop framework.

Commenting on the new collaborations, Martin Fink, executive vice president and chief technology officer of HPE and board member of Hortonworks, said:

We’re hoping to enable the Spark community to derive insight more rapidly from much larger data sets without having to change a single line of code. We’re very pleased to be able to work with Hortonworks to broaden the range of challenges that Spark can address.

Additionally Hortonworks signed a partnership with Impetus Technologies, Inc., another solution provider based on open source technology. The agreement includes collaboration around StreamAnalytix™, an application that provides tools for rapid and less code development of real-time analytics applications using Storm and Spark. Both companies have the aim that with the use of HDF and StreamAnalytix together, companies will gain a complete and stable platform for the efficient development and delivery of real-time analytics applications.

But The Real News Is …
Hortonworks is rapidly evolving its vision of data management and integration, and this was in my opinion the biggest news of the analyst event. Hortonworks’s strategy is to integrate the management of both data at rest (data residing in HDP) and data in motion (data HDF collects and curates in real-time), as being able to manage both can power actionable intelligence. It is in this context that Hortonworks is working to increase integration between them.

Hortonworks is now taking a new go-to-market approach to provide an increase in quality and enterprise readiness to its platforms. Along with ensuring that ease of use will avoid barriers for end use adoption its marketing message is changing. Now the Hadoop-based company sees the need to take a step further and convince businesses that open source does more than just do the job; it is in fact becoming the quintessential tool for any important data management initiative—and, of course, Hortonworks is the best vendor for the job. Along these lines, Hortonworks is taking steps to provide Spark with enterprise-ready governance, security, and operations to ensure readiness for rapid enterprise integration. This to be gained with the inclusion of Apache Ambari and other Apache projects.

One additional yet important aspect within this strategy has to do with Hortonworks’s work done around enterprise readiness, especially regarding issue tracking (figure 3) and monitoring for mission critical workloads and security reinforcement.

Figure 3. SmartSense 1.2 includes more than 250 recommendations

(Image courtesy of Hortonworks)

It will be interesting to see how this new strategy works for Hortonworks, especially within the big data market where there is extremely fierce competition and where many other vendors are pushing extremely hard to get a piece of the pie, including important partners of Hortonworks.

Taking its data management strategy to a new level is indeed bringing many opportunities for Hortonworks, but these are not without challenges as the company introduces itself into the bigger enterprise footprint of the data management industry.

What do you think about Hortonworks’s new strategy in data management? If you have any comments, please drop me a line below and I’ll respond as soon as I can.

(Originally published)

Creating a Global Dashboard. The GDELT Project

Jorge Garcia8:58:00 AManalytics, big query, bigdata, data science, gdelt project, google, social media

There is probably no bigger dream for a data geek like myself than creating the ultimate data dashboard or scorecard of the world. One that summarizes and enables the analysis of all the data in the world.

Well, for those of you who have also dreamt about this, Kalev H. Leetaru, a senior fellow at the George Washington University Center for Cyber & Homeland Security has tapped into your dreams—and is working on something in this realm. Leetaru, whom some have called “The Wizard of Big Data,” is developing a platform for monitoring and better understanding how human society works.

The project called Global Database of Events, Language, or simply The GDELT Project, is an ambitious endeavor created to “crack” the social numbers of the world, and has the aim of improving our understanding of human society.

As described by the folks at GDELT:

The GDELT Project came from a desire to better understand global human society and especially the connection between communicative discourse and physical societal-scale behavior. The vision of the GDELT Project is to codify the entire planet into a computable format using all available open information sources that provides a new platform for understanding the global world.

To do this, The GDELT Project has collected information dated back to 1979 and keeps updating it regularly, so its catalogs are always fresh. According to GDELT, the project has already more than a quarter billion event records in more than 300 categories. It also keeps up to date a massive network diagram that connects each individual with all existing entities and events in the world, such as locations, organizations, themes, emotions, and other data.

Information is gathered from many sources including: Google, Google ideas, Google News, the Internet Archive, BBC Monitoring, among many others.

So what makes The GDELT Project so interesting?

Well, it’s a perfect opportunity for data aficionados to lay their hands on social data from around the world in three different ways:

Using GDELT Analysis Service, a free cloud-based offering that includes tools and services to visualize, explore, and export the data.
Using the complete dataset available at Google’s Big Query service.
Downloading data in CSV format.

This allows different types of users to get their hands on the data in the way that suits them best. So, for immediate consumption and analysis, users can go with the first option. Users with more specific requirements or with complex projects can use the data provided by the second or third option.

Whichever way you choose to access the worldwide data, this could be a great opportunity for you, my dear data junkie, to explore and embark upon a data deluge journey for a new school, entrepreneurial, or just playtime project.

This is just a brief intro into really cool project. I’ll update you on major advancements of The GDELT Project as they come along.

In the meantime, I would encourage you to have a look at this nice 20-minute video about The GDELT Project.

As always, you can also drop me a line below. Enjoy.

'D' of Things

Wednesday, November 2, 2016

Logging challenges for containerized applications: Interview with Eduardo Silva

Monday, October 31, 2016

Teradata Partners Conference 2016: Teradata Everywhere

Monday, October 17, 2016

Salesforce Acquires BeyondCore to Enable Analytics . . . and More

Monday, September 19, 2016

IT Sapiens, for Those Who Are Not

Thursday, September 8, 2016

Influencer Summit 2016—Teradata Reshapes Itself with Analytics and the Cloud

Monday, July 25, 2016

Zyme: Emergence and Evolution of Channel Data Management Software

Wednesday, June 15, 2016

An Interview with Dataiku's CEO: Florian Douetteau

Monday, April 18, 2016

Altiscale Delivers Improved Insight and Hindsight to Its Data Cloud Portfolio

Wednesday, March 30, 2016

Hortonworks’s New Vision for Connected Data Platforms

Tuesday, March 8, 2016

Creating a Global Dashboard. The GDELT Project

Subscribe to Our Mailing List

Join

U2FsdGVkX1/fEnxN4HbaJbIwFcCulELlSfVRMtcAcPyO3hAvBMr/mrugjX3+zCbz

Blog Archive

Blog Archive

Tweets by joxdot

Text Widget