Our era is characterized by the production of massive amounts of data, which increasingly drives enterprises to adopt a data-driven culture i.e. to base their processes and decisions on the collection, processing and analysis of datasets and that’s why most enterprises consider Big Data. Big Data refers to the collection and processing of data that is very difficult to capture, store and analyze based on the capabilities of state-of-the-art data management systems. These systems are characterized by the famous Vs (Volume, Variety, Velocity, Veracity), which differentiate them from conventional data processing systems.
Although many organizations refer to their enterprise data deployments as “Big Data”, only few really understand the term and its implications, while even fewer have managed to deploy it successfully. The deployment of Big Data systems is a challenging task, from both the management and the technological points of view. Enterprises must therefore try to understand the complexities and take proper steps.
Management and Planning
Big Data is a non-trivial project, which must be proactively and carefully planned. First, there is a need to justify a Big Data deployment, by identifying the Vs in enterprise data and the importance of dealing with them towards improving business results. Identification of the data sources that will be exploited is also needed, along with a strategy for their gradual integration. At the same time, the data-driven processes that will be supported by the Big Data system should be identified.
A Big Data project is never implemented overnight: it requires a phased approach, which progressively deploys data sources and applications, along with data analytics of increased sophistication. As a result, a Big Data project management plan should make provisions for gradual deployment, including smooth migration of data from existing systems. As an early step, this planning may involve a pilot project, which will use a limited number of data sources in the scope of a smaller scale deployment. However, any Big Data project should be empowered by a proper architecture, which scales in a cost-effective way in order to efficiently handle the ever increasing amount of enterprise data.
One of the major management challenges for Big Data is the assembly of a proper team. Successful Big Data deployments ask for the effective collaboration of individuals with a wide range of skills sets, including database experts, computer scientists, programmers, data scientists, statisticians, business analysts and more. Despite the enthusiasm around Big Data, there is still a talent shortage in terms of competent people who can fulfill the above roles in real-life projects.
Management challenges stem also from the fact that Big Data requires the deployment of a pool of different technologies, which have to be orchestrated based on a disciplined Big Data architecture. Hence, CIOs, project managers and the management team have to monitor a challenging technology project, which involves complex procurement, deployment and integration processes.
Big Data is typically based on a pool of leading technologies including:
- A Reliable and scalable physical infrastructure: Big Data hinges on the deployment of a resilient, high-capacity, redundant distributed computing infrastructure, such as a private or a public cloud. This infrastructure ensures the hosting of data in a reliable and scalable way, acting as a cornerstone of the Big Data architecture.
- Operational databases and data sources: This refers to middleware components for accessing and collecting not only structured data, but also unstructured and semi-structured data. The respective technologies guarantee that the Big Data deployment will be able to process all sorts of enterprise data, no matter whether they reside in corporate databases, web sites, social media posts, e-mails or even unstructured notes.
- Database systems and tools: A Big Data system employs a variety of databases, including both SQL and noSQL databases where data from the different sources are persisted. These databases come with a range of tools for managing the data.
- Data warehouses and data marts: Selected data (typically data with high business value) make it to warehouses and marts, which enable a scalable multi-level partitioning and segmentation of the data.
- Reports: Among the added value of Big Data technologies is their ability to present intuitive dashboards and interactive reports, which facilitates end-users in understanding insights derived from Big Data processing and analytics.
- Big Data applications: The ultimate goal of the Big Data technology infrastructure is to support the operation of its applications, which either improve business processes or facilitate managerial decision making.
- Security: Big Data projects need to take all necessary provisions in order to secure storage and access to enterprise data. This includes compliance to any applicable regulations for security, privacy and data protection.
Big Data Analytics Deployment
With a proper technology infrastructure at hand, the focus of a Big Data deployment can be put on devising effective analytical models for extracting knowledge. This is a structured process that typically involves the following steps:
- Understanding the business problem, including what kind of knowledge should be derived from the data. This might, for example be the identification of most profitable customers in a retail application, the estimation of the lifetime of a machine or components in the manufacturing environment or even the medical history parameters that affect the efficacy of a drug in a pharmaceuticals application.
- Understanding the available data, as a means of identifying candidate data analytics models. By inspecting the data analysts gain insights on the models that are likely to lead to the desired knowledge.
- Preparing the data to be used in the context of a target business problem, through the exploitation of the technologies that deal with data acquisition and storage. This may involve converting structured and unstructured data sources to the format needed for the problem at hand.
- Devising and testing machine learning and data mining models such as decision trees, neural networks, logistic regression, Bayesian classification and more.
- Evaluating alternative models based on enterprise data derived from the field, until one or more models achieve the desired performance and efficiency. Evaluation is a multi-facet process that is not limited to the precision and efficiency of a model, but considers aspects such as deployment speed and latency as well.
- Deploying the models over the established infrastructure and using it in the scope of Big Data applications, reports and visualization.
Preparing for Big Data
Big Data projects are indeed challenging, but based on an elaborate planning and disciplined management they can pay off, leading to measurable return on investment and improved business results. To this end, there are already solutions and good practices that a company can explore together with its technology partner. Get prepared as you cannot afford to miss the Big Data train.