Skip to main content

Hitachi

Corporate InformationResearch & Development

The era of "Big Data" has arrived. All around us, each and every day, networks are carrying huge and ever-increasing quantities of information. It is said that the number of networked devices in the world will soon exceed fifty billion. Some of Hitachi's young researchers are contributing to the development of high-throughput messaging technology that can process this colossal amount of data in a smart way.

Messaging technology for the Big Data era

How would you define high-throughput messaging technology?

Photo: MIZUTANI Izumi

MIZUTANILike the name suggests, it's technology for processing messages quickly. And by "messages", we mean pieces of information that are exchanged across a network. Like, for example, emails and text messages sent between mobile phones. Messaging is the process whereby this information is relayed from sender to recipient.

Our department is mainly involved in the development of messaging systems technology for mobile communication carriers. When systems get bigger and more complex because they have to handle larger amounts of data, they need technology that can process messages in large quantities. We have developed high-throughput messaging technology to address this need.

Photo: KINOSHITA Masafumi

KINOSHITASome time soon, there will be over fifty billion networked devices in the world, and there will also be many more sensors connected to networks. As a result, there will be more information and data exchanged between networked objects based on concepts such as machine-to-machine (M2M) communication and the Internet of Things (IoT). Messaging technology also applies to information and data exchanged in this way.

Take your household electricity meter, for example. Mechanisms are to be realized that allows meters to be read remotely over a network without anyone having to visit customers' home to read the meter. As mechanisms like this become more commonplace, messaging process will involve a huge amount of work. Against this background, the need for high-throughput messaging technology is going to continue growing.

So it's technology that can handle a sharp increase in the numbers of messages, right?

MIZUTANIYes. But we didn't just have to improve the overall performance when processing huge numbers of messages at high speed—we also needed to make local improvements. In particular, we needed a way of controlling the flow of traffic during temporary traffic spikes.

For example, there will probably be lots of people sending mails that say "Happy New Year!" as soon as the clock strikes twelve on New Year's Eve. Situations like this cause a phenomenon called "burst traffic", where the peak traffic levels can reach several times the usual levels. This can make it impossible for the system to process messages fast enough, resulting in delays and temporary restrictions on the sending of messages. If you simply increase the scale of the system until it's large enough to handle all this burst traffic, you'll end up spending a lot of money on extra capacity that will hardly ever be needed. A better solution is to build a system that can handle ordinary levels of traffic, but uses control methods that allow it to adapt to irregular peaks in the traffic level without becoming overwhelmed.

KINOSHITAHigh-throughput messaging technology takes a two-pronged approach. On the one hand, it increases the system's performance so it can cope with increases in the overall amount of data, and on the other hand, it optimizes the control of temporary spikes in the traffic levels.

Figure 1: The Big Data era and high-throughput messaging technology

The fusion of Hitachi's technology with distributed in-memory KVS

First of all, can you describe the mechanism used to process huge volumes of messages at high speed?

KINOSHITAA conventional system consists of a server that performs collection and delivery of messages, together with external storage that stores the messages temporarily. When it receives a message, it first places it in a storage area called a "queue" inside the external storage so that it can be delivered reliably even if a fault occurs in the network or server, and then delivers the message from this queue.

But with this configuration, the speed of collection and delivery processing is limited by the time taken to save these messages in the external storage. Although you could add extra servers to handle greater numbers of messages, this wouldn't be easy because you would have to reset the network configuration and adjust all sorts of settings every time.

Figure 2: Differences between the conventional method and our new approach

KINOSHITATo get round these problems, we decided to use a technique called "distributed in-memory KVS". KVS stands for "key-value store", which is a way of storing and managing data with a very simple structure. It's highly scalable and allows extra servers to be added easily, which means that the overall system performance can be flexibly altered.

Since this KVS processing is performed in memory, there is no need for disk access and messages can be processed much faster. However, since the processing is performed in memory, there is a risk of losing data if a server goes down. This risk is covered by using a distributed configuration where the data is duplicated across multiple servers to improve reliability.

Figure 3: Distributed in-memory KVS

Doesn't this simple structure result in a system with less functionality than one that uses external storage?

KINOSHITASince KVS has a simple data management function, there's little functionality. For example, relational databases have a function of exclusive control that maintains the integrity of data by preventing it from being accessed by other servers while it is being updated. KVS doesn't have any function like this. It doesn't even store messages temporarily in a queue.

So in order to apply KVS to messaging systems, we had to develop its own set of functions for use in messaging. You could say that KVS is a bit like a Formula 1 racing car. Whereas an ordinary car is configured with all sorts of functions, a Formula 1 car is specialized to make it go as fast as possible and consists of little more than an engine mounted on a chassis. In the same way, instead of using a database with various functions, we're using KVS because its specialized for high-speed processing and scalability, and we're only adding functions that are absolutely necessary.

So by using distributed in-memory KVS with a few additional functions, we can achieve high performance and high reliability. When we applied this technique to a messaging system for mobile phones, we were able to handle the delivery of up to 13 million messages per hour on each mail server. This performance is 180 times faster than the open-source sendmail utility that many businesses rely on. It's even four times faster than Hitachi's previous technology.

Optimizing the flow control of received data

How does this system deal with temporary traffic spikes?

MIZUTANIWell if we're talking about email, then the incoming traffic is spread across multiple mail servers by a load balancer. When a large amount of data arrives all of a sudden, flow control can be used to avoid excessive loads on each mail server by sending back an error message to stop the arrival of new data.

Previously, this control has been performed based on the number of connections to each mail server. When this number reaches a regulation value, the server is judged to be heavily loaded and any further connections will be met with an error response.

In practice, however, the load on a mail server can vary according to traffic patterns such as the number of recipients and the mail size, even if the number of connections is the same.

For example, a server can be heavily loaded by a small number of emails if each individual email contains a lot of data or has a large number of receivers requiring delivery processing. On the other hand, even when the number of connections to a server exceeds the regulation value, the number of connections to the server may not need to be regulated if each email is small. So the number of incoming connections alone doesn't provide enough information for deciding when to regulate the traffic.

Figure 4: Problems of the conventional method

MIZUTANISo instead of considering the number of connections, our new system monitors the load of the mail server. Specifically, since disk I/O has been the greatest bottleneck in mail server loading, we start restricting the reception of traffic when the disk I/O load has reached a threshold value. We also took steps to avoid making excessive use of traffic regulation.

Avoiding overly responsive regulations

You avoid making excessive use of traffic regulation? How does that work?

MIZUTANIConventional systems use a drastic form of regulation where as soon as the number of connections exceeds the regulation value, the server refuses to accept any data until the number of connections has fallen again. This can lead to a state of over-regulation where the server replies with errors to mails that it should have been able to handle.

KINOSHITASmall burst traffic can often occur momentary.
Suppose that three times the normal amount of traffic arrives in one second, but the amount of traffic arriving in the next second is zero. The average traffic level over this two-second interval is 1.5 times the average. But if the traffic regulation kicks in as soon as the system detects the threefold increase during the first second, then the system can become unable to handle traffic that it may in fact have been able to process without difficulty. So the use of simple regulations has several drawbacks.

MIZUTANIOur new technique decides when to impose regulation after looking at the situation over a longer period of time so as to avoid issuing errors for messages that could have been processed, allowing processing to continue with the bare minimum of regulation.

This is implemented by setting a fixed regulation judgment interval and monitoring the cumulative disk I/O load during this interval (Fig. 5(1)). Regulation kicks in when this cumulative value exceeds the regulation judgment interval multiplied by a threshold value (Fig. 5(2)).

We also arranged that this regulation gradually becomes stricter by reducing the regulation value as time passes (Fig. 5(3)). If the disk I/O load decreases while adjusting the regulation over time in this way, then the regulation can be released. The judgment of when to release the regulation is can be made in a similar way based on a fixed regulation release judgment interval. When the cumulative total of the disk I/O load during this interval falls below the regulation release judgment interval multiplied by the threshold value (Fig. 5(4)), the regulation is released.

Figure 5: Regulation mechanism

Did the research go smoothly?

MIZUTANIWe made good progress up to the point where we built a prototype based on this idea, but when we actually got it running we found it difficult to determine the values of parameters like the regulation value and the judgment intervals. The only way we were able to set these values was through a process of trial and error.

To verify the system, we constructed an environment close to actual conditions using components such as software to send emails, and we tested the system with a variety of different traffic patterns. Including the fine-tuning after the system had been presented to our customer, this took maybe a month or so.

As a result, we were able to arrive at a mechanism that reduces the average response time by up to 91% compared with conventional systems.

Figure 6: Evaluation results (average response time)

The appeal of research that can be presented directly to the customers

Did the flow control of mail work properly in the production environment?

Photo: KINOSHITA Masafumi

KINOSHITAYes, without a hitch. I monitored the flow control in the production environment over New Year's Eve and into the New Year, and as expected there was a huge surge in traffic. However, the system worked well without any particular problems. It felt really great being able to make a real improvement to our customer's system.

MIZUTANII was on standby at home that night, but I was still nervous. When I heard it had all worked without any problems, it was a great relief.

You're keen to get a lot more customers using this system, aren't you?

Photo: MIZUTANI Izumi

KINOSHITAAbsolutely. Our research on high-throughput messaging technology is mainly geared towards addressing the needs of our customers, but as I said earlier, the number of devices that can connect to networks will increase rapidly in the future. I would like to see new applications introduced wherever possible.

MIZUTANIFlow control is currently at the stage where it is specialized for mail server applications, but I hope that in the future it will also be applied to many other fields such as sensor data. I think it would also be interesting to consider other applications besides messaging, like Big Data analysis and processing.

The results of this research are already being put to practical use, which is encouraging for me as a researcher. I feel more confident because our research results are used in the real world.

KINOSHITAI am also fun to present our research results directly to the customer. People tend to think that researchers spend all day in a laboratory, but actually I can work on the business side as well as the research side. And I do feel that we're making a useful contribution to the world. It's very rewarding work.

(Publication: June 11, 2014)

Notification

  • Publication: June 11, 2014
  • Professional affiliation and official position are at the time of publication.
  • Page top

Related contents

  • Page top