The Big Data Supply Chain and Data Science Management
The Big Data Supply Chain and Data Science Management is an innovative approach to managing an organization's data. While large and diverse data sets may be turned into an asset, there is no general consensus on how to manage and address everyone's needs. Traditional data strategies assume data is created, distributed, and consumed within an organization's four walls.
Yet in a big data world it is insufficient to simply manage and track where data is created and consumed. Knowing the origin and quality of data, how it moves, migrates, transforms and is used to make decisions is critical to managing a modern organization. This new approach to managing data assets expands traditional information life cycles to include data cleansing, quality, validation, sourcing, manipulations, provisioning, and logistical secondary uses to successfully understand and manage the transformation of information into valuable, actionable intelligence.
The new world of big data has sparked a debate about traditional versus new strategies for managing data. The traditional way is based on an organization creating, storing, analyzing and distributing data internally. Most traditional (last 15 years) data warehouse / business intelligence platforms are designed on this model and can manage and track where data is created and consumed.
The new strategy for managing big data includes: both internal and external data; structured, semistructured and raw unstructured data; both internal and external data analytical ecosytems and applications; both internal and external data providers; business and data analysts; and both internal and external professional data scientists.
As a result, in addition to managing and tracking where data is created and consumed, an organization needs fast and easy access to large volumes and varieties of data and to know how it moves, transforms and migrates. Traditional data warehouse / business intelligence platforms and Master Data Management techniques are unable to do this job in a world of high data volume and variety coming from multiple sources at increasing velocity and varying quality.
Furthermore, the past 15 years has seen a shift from custom-built to packaged applications to automate knowledge / business processes. The design flaw is that custom code and middleware is required to move all this data between the packaged systems. The brain damage, money and time spent on data migration solutions - in addition to the human capital needed to clean the data - is huge and wasteful. Current ETL tools are primitive and while they save time and reduce custom coding they are not a long term solution. Moreover, traditional ETL design will not work with the new volume, variety and velocity of large internal and external data sets.
The Big Data Supply Chain and Data Science Management offers a solution. The data supply chain concept was pioneered by Walmart years ago and seeks to broaden the traditional corporate information life cycle to include the numerous data sourcing, provisioning and logistical activities required to manage data. Walmart understood the design flaw in having a separate custom distribution system. The solution was a standard distribution system where standardization occurs at the source.
Simply put, the Big Data Supply Chain and Data Science Management is all about standardization of data. Focus on designing and building one standardized big data supply chain instead of custom distribution systems for each application. Eliminate middleware, ETL and writing massive amounts of custom code to standardize, clean and integrate data.
Yet standardizing data at the source is only part of the solution. The other part is Data Science Management (DSM).
DSM can be defined as a set of policies, procedures, applications and technologies for harmonizing and managing data allowing folks to easily and quickly find trusted information to make optimal decisions.
DSM standardizes data enabling better data governance to capture and enforce clean and reliable data for optimal data science, business analytics and decision making. Standardized values and definitions allow uniform understanding of data stored in various databases in the big data analytical ecosystem so users can find and access the data they need easily and fast.
DSM comprises a set of processes and tools that defines and manages data. Quality of data shapes decision making and DSM helps leverage trusted information to make better decisions, increase profitability and reduce risk.
DSM is critical for the new Data-as-aService (DaaS) concept where data is provided on demand to the user regardless of geographic or organizational structures. Benefits include:
- Improving organization agility
- Providing a single trusted view of people, processes and applications
- Allowing strategic decision making
- Reducing operational costs
- Increasing compliance with legal regulatory requirements
DSM helps organizations handle the following key issues:
- Data cleansing
- Data quality
- Data accessibility
- Data redundancy
- Data inconsistency
- Decision making inefficiency