Oct 2, 2019

Hitachi Commences Provision of Data Preparation Service to Streamline the Preparation Work for Data Use and Analysis Utilizing AI

AI proposes the specifications and missing values for large amounts of varied data, efficiently preparing high-quality data and facilitating its use by our customers

Download PDF

Tokyo, October 2, 2019 --- Hitachi, Ltd. (TSE: 6501, "Hitachi") today announced the successful development of Data Preparation Service. The Service uses AI to streamline data pre-processing, which is an essential part of preparing data for use and analysis. Provision of this service will commence on October 2.

The Service analyzes various types of internal and external data, using the results to carry out sophisticated data transformations and processing, which would usually require considerable labor-hours to complete. Starting with understanding the data specifications and current trends, the Service can then assess and verify processing techniques (such as cleansing(1)), and effortlessly transition to pre-processing of live data. A dedicated Service is set up for each customer or project, which is used to provide functions such as proposing item names, supplementing missing values, and identifying correlations among the input data. The Service also offers functions that give the customer access to the data processing expertise of experts such as data scientists which are registered and shared.
This lets our customers perform high-quality and efficient data pre-processing even without expert-level skills or expertise in disciplines such as programming or statistics. Minimizing the workload associated with pre-processing lets customers focus their time and effort where they are needed most: on analysis. Through this process, the Service facilitates our customers' use and application of data, and supports their digital transformations.
In recent years there has been a rapid increase in the business use not only of data collected in the course of business operations, but also of heterogeneous data like that collected from sensors and other IoT devices. There are often no item definitions for data generated in the field, and similar data can be managed under different names in different systems. When the time comes to analyze or use the data, pre-processing in the form of understanding the data specifications, unifying its format, and merging similar data(2) is absolutely essential.
Businesses have stated that this pre-processing work can consume as much as 80 percent of the labor hours allotted to the data analysis work as a whole. This includes the back-and-forth with data scientists and experts in relevant fields, and repeated verification of cleansing and integration, none of which can be neglected. There are also many cases in which errors in data integration have impacted the results of data analysis, making accuracy an issue alongside the issues of excessive specialization in analysis work and dependence on the skills of individuals.

Hitachi is expanding its Lumada(3) business, providing various solutions that help businesses derive new value from data. The expertise Hitachi has cultivated through its Lumada initiatives now provides the strong foundation on which the Service is built. Data pre-processing typically involves moving back and forth between data understanding in which the data specifications are identified, and assessment and validation of data processing in which data processing such as cleansing and integration is trialed based on the identified specifications. By repeating these processes, you iteratively improve the quality of the data. The Service analyzes various information required to understand the data, such as data items, missing values, and correlations among the data, and presents the results in an intuitive, graphical way.
The Service includes a feature for registering and sharing processing methods (pre-processing logic(4)). This feature lets you easily verify highly specialized processing tasks such as supplementing missing values and eliminating outlying or duplicate data. Integration with an ETL tool(5) eliminates the need to individually implement each piece of verified pre-processing logic in the ETL, meaning that pre-processing of real-time data streams can be effortlessly integrated into routine operations(6).
The Service supports the streamlining of a wide variety of data transformation and processing work, ultimately feeding high-quality data to a data lake(7). Through sophisticated data analysis and improved input data for AI, IoT systems, and business systems, the Service facilitates the use of data for a wide range of applications.

Enhancements are already in the works, such as collaboration with Lumada Solution Hub(8), which uses features of the Lumada platform to deliver digital transformations quickly and easily. The Service will strongly contribute to enhancing collaboration with customers and partners, to further facilitating our customers' use of data, and to creating new business value.

(1): A type of data processing. Cleansing involves such tasks as supplementing missing values and removing outlying data.
(2): A type of data processing. This processing involves eliminating or associating duplicate data.
(3): A collective name for solutions, services, and technologies that use Hitachi's leading digital technology to derive value from customer data and accelerate digital innovation.
(4): A program that incorporates the processing flow or procedures to implement the actual processing for data integration and cleansing
(5): An abbreviation for Extract Transform Load. ETL software automates the extraction and transformation of different types of business data including core system data based on pre-processing logic created by the user. Users of the Service have access to Hitachi's Pentaho data integration and analysis platform, or can use any other ETL tool of their choice.
(6): The same processing is applied on an ongoing basis to all data including data that is newly added and updated
(7): A repository for centrally storing large volumes of data in various formats from many data sources. A data lake can store structured data, semi-structured data, and unstructured data, in a format that retains its versatility however it may be used in the future.
(8): News release (March 18, 2019) "Hitachi Launches "Lumada Solution Hub" to Advance and Facilitate Introduction of Lumada Solutions"

Key features of the Data Preparation Service

1. AI-driven analysis of data specifications and quality that supports data "understanding" by inferring item names and outliers

The Service provides various features that automatically analyze the information required to understand data specifications. One is a "data item name inference" feature that infers meaningful names implied by data, such as "Speed" or "Latitude/Longitude". Another is a "data profile" feature that uses AI to analyze data features and trends, and graphically presents information about unnecessary data or data that needs to be unified or transformed. Yet another is a "data relationship analysis" feature that automatically derives the relationships among data required when considering data integration from a feature quantity of the data, and proposes them to the customer. The customer obtains a firm and precise grasp of data specifications and trends with relative ease, eliminating the work traditionally needed to "understand" data, such as interviewing business experts and data providers, reviewing specification documents, and assessing quality.

2. Streamlining logic verification through sharing of data processing methods and provision of coding-less, graphical interfaces

The "pre-processing logic sharing" feature of the Service allows the sharing of processing methods for data cleansing or integration. This feature streamlines logic assessment and verification for pre-processing and augments skills by allowing logic to be shared efficiently within a team or project, whether it is standard logic, often-used generic logic, or specialized logic provided by experts.
Because the series of tasks from understanding the data specifications to verifying the logic take place in an intuitively designed graphical interface, the Service eliminates the labor-intensive coding work that was required whenever logic needed to be verified.

3. Integration of verified pre-processing logic with ETL tools supports seamless transition to pre-processing of live data

The Service offers a "pre-processing logic export" function that links the devised pre-processing logic with an ETL tool of the customer's choice. This function streamlines the incorporation of pre-processing of the data generated by various devices and systems into routine operations by eliminating the need to implement logic in the ETL tool individually for each type of data. Direct integration of logic with an ETL tool provides a seamless experience across the entire cycle spanning data understanding, verification of pre-processing logic, and transition to live operation, thereby facilitating the customer's use and application of data.

For Business Inquiries on this Service

Hitachi Enterprise Application Service

About Hitachi, Ltd.

Hitachi, Ltd. (TSE: 6501), headquartered in Tokyo, Japan, is focusing on Social Innovation Business combining its operational technology, information technology and products. The company's consolidated revenues for fiscal 2018 (ended March 31, 2019) totaled 9,480.6 billion yen ($85.4 billion), and the company has approximately 296,000 employees worldwide. Hitachi delivers digital solutions utilizing Lumada in five sectors including Mobility, Smart Life, Industry, Energy and IT, to increase our customer's social, environmental and economic value. For more information on Hitachi, please visit the company's website at https://www.hitachi.com.

Categories

Digital & AI