by Sanjeev Kapoor 30 Mar 2017
Extracting Knowledge from Big Data: What you need to know
share on

Extracting Knowledge from Big Data: What you need to know

The extraordinary volume of data that is nowadays produced in different contexts and by various platforms and devices (such as social networks and multi-sensor system) has given rise to the Big Data movement and data economy.  Big Data is one of the main pillars of the information society and a cornerstone for the next generation of smart systems, which act automatically and intelligently, optimizing productivity, business processes and managerial decisions. Despite significant advances in our ability to collect, store, manage and process data characterized by the 4Vs (Volume, Velocity, Variety and Veracity), Big Data’s business value still lies in the analytics. Raw data tends to be useless, unless it is properly processed and transformed to actual insights for any business. Applications helping with diagnosis of diseases, forecasting of the demand for electricity, prediction of a machine’s end of life, identification of the driving context etc. are all based on the processing of large volumes of data and deriving knowledge.

 

Big Data Analytics and Knowledge Extraction

There are many different ways and techniques for extracting knowledge from raw Big Data. In most cases data scientists, employ statistics for testing some knowledge-related hypotheses and machine learning as a means of building a high-performance software agent that is able to learn from the data. As part of the data mining and knowledge discovery process, scientists combine statistics and machine learning in a way that integrates theory and heuristics.  Furthermore, they undertake other prerequisite activities for extracting and using knowledge, such as data cleaning, data transformation, as well as visualization of the extracted knowledge.

A data mining and knowledge extraction process may have different targets depending on the business problem at hand. Some of the most common tasks include:

For each one of the above tasks, data mining experts, have at their disposal, several tools and techniques (e.g., decision trees, Bayesian methods, linear regression, k-means clustering), which need to be appropriately configured and parameterized according to the business requirements at hand. The identification, validation and ultimate deployment of an optimal data mining model involves a series of tasks, which are carried out in an iterative fashion.

The following tasks are part of the data mining process.

 

The CRISP-DM Data Mining Process

Data mining processes analyze the datasets and evaluate alternative data mining models as a means of identifying and selecting the most suitable ones for deployment. The most widely used data mining process is the CRISP-DM (Cross Industry Standard Process for Data Mining), which comprises the following six phases:

 

Other Data Mining Methodologies

CRISP-DM is not the sole data mining methodology is use. Other popular methodologies include KDD (Knowledge Discovery in Databases) and SEMMA (Sample, Explore, Modify, Model, and Assess). These methodologies comprise of slightly different phases and activities when compared to CRISP-DM. However, they have similar characteristics to CRISP-DM:

Moreover, they comprise of similar phases. For example, KDD includes the selection, pre-processing, data transformation, data mining, and interpretation-evaluation. On the other hand, SEMMA comprises of the sampling, exploration, modification, modeling and assessment phases. While there is no one-to-one mapping between these phases, the names of these phases indicate a clear pertinence to the structure and phases of the CRISP-DM data mining process.

In the Big Data era, it is very important to employ experts that have a very good understanding of data mining processes, as the business value of Big Data is mainly in the analytics. Given the the proclaimed talent gap in Big Data experts, it’s always a good idea to look for reliable and knowledgeable business partners that can help you derive knowledge and maximize the value of your data.

Recent Posts

get in touch

We're here to help!

Terms of use
Privacy Policy
Site Map
2015 IT Exchange, Inc