Skip to main content

Hitachi

Corporate InformationResearch & Development

September 9, 2013

Report from Presenter

The Worldcomp 2013 was held in the USA from July 22 to July 25, 2013. The Worldcomp 2013 consists of 22 conferences. A variety of researchers discussed many research topics in the conference, such as the Computer Design, Parallel Data Processing, Data Mining, and Artificial Intelligence.


Fig. 1 The model of file system occupancy
Enlarge

I attended the CDES (The 2013 International Conference on Computer Design), which is one of the conferences of Worldcomp 2013, and I presented the method for file level de-duplication system. The followings introduce the paper entitled "SDD: Selective De-Duplication with Index by File Size for Primary File Servers".

Recently, the data stored in the file servers which are used by the end users is increasing. And the size of file system becomes large. Therefore, the administrators of the servers struggle to reduce management costs for the servers.


Fig. 2 Overview of the proposed method 1
Enlarge



Fig. 3 Overview of the proposed method 2
Enlarge

One of the methods for solving this problem is "file level de-duplication". The file level de-duplication extracts files that have the same data, and then it eliminates the data except one file. Therefore, the file level de-duplication can remove redundant data. When the file servers support the file level de-duplication, it becomes challenge to extract these files because quite many files are stored in file systems.

We proposed the method for extracting these files based on file size. The proposed method extracts files that are larger than threshold size, because those large files occupy a large amount of space in file systems (Fig. 1). Moreover, the proposed method doesn't use hash functions which are generally used by other file level de-duplication systems. The proposed method, however, only uses file size and byte-by-byte comparison for extracting duplicated files (Fig. 2, Fig. 3). We evaluated the proposed method and showed the effectiveness of it when we set the appropriate threshold.

(By KAMEI Hitoshi)

  • Page top