Features Download
From: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez <at> hitachi.com>
Subject: [RFC PATCH 0/2] ivring: Add IVRing driver
Newsgroups: gmane.linux.kernel
Date: Tuesday 5th June 2012 10:49:54 UTC (over 4 years ago)
Hi All,

The following patch set provides a new communication path "IVRing" for
collecting kernel log or tracing data of guests by a host without using
in a virtualization environment. Network is generally used to collect log
tracing data after outputting the data as a file. However, since I/O
such as network or block are shared with other guests, these resources
not be used for logging or tracing. Moreover, high load will be taken to
applications on guests using network I/O because there are many network
layers. Then, a communication method for collecting the data without using
I/O resources is needed.

There are two requirements to collect kernel log or tracing data by a host:
 (1) To minimize for user applications in a guest
     - not using I/O resources
 (2) To be implemented recording buffer like ring
     - keep on recording log data or trace data
To meet these requirements, a ring-buffer as a device driver for guest OSs,
called IVRing, is constructed on Inter-VM shared memory (IVShmem) device.
IVShmem implemented in QEMU is a virtual PCI RAM device and uses POSIX
memory on a host. This device is originally used as a virtual device for
low-overhead communication between two guests. On the other hand, here,
is used as a communication path between a guest and a host for collecting
IVRing is a buffer of logging or tracing data in a guest, and
opening shared memory as IVRing on a host, reads the data without memory
between a guest and a host. Thus, two requirements are met for collecting
log or tracing data.

We will talk about IVRing in LinuxCon Japan 2012:
	Title: Low-Overhead Ring-Buffer of Kernel Tracing &
	       Tracing Across Host OS and Guest OS
	Speakers: Yoshihiro Yunomae and Akihiro Nagai
You can download our slides about IVRing in the schedule page.

When a host collects tracing data of a guest, the performance of using
is compared with that of using network.

The overview of this evaluation is as follows:
 (a) A guest on a KVM is prepared.
     - The guest is dedicated one physical CPU as a virtual CPU(VCPU).

 (b) The guest starts to write tracing data to a SystemTap buffer.
     - The probe points of SystemTap are all trace points of sched, timer,
       and kmem.

 (c) The tracing data are recorded to IVRing sharing memory with a host or
     the tracing data are sent to a host via network.
     - 3 patterns, IVRing, NFS, and SSH, are measured.
       Each methods is explained about later.

 (d) Writing trace data, dhrystone 2 in UNIX bench is executed as a
     tool in the guest.
     - Dhrystone 2 intends system performance by repeating integer
       as a score.
     - Since higher score equals to better system performance, if the score
       decrease based on bare environment, it indicates that any operation
       disturbs the integer arithmetic. Then, we define the overhead of
       transporting trace data is calculated as follows:

The performance of each method is compared as follows:
 [1] IVRing
     - A SystemTap script in a guest records trace data to IVRing.
     - A IVRing-reader on a host reads the data.
 [2] NFS
     - A directory in a guest is shared with that in a host via NFS.
     - A SystemTap script in a guest records trace data to a file
       in the directory.
 [3] SSH
     - A SystemTap script in a guest output trace data to a host using
       standard output via SSH.

Other information is as follows:
 - host
   kernel: 3.3.1-5 (Fedora16)
   CPU: Intel Xeon [email protected](6core)
   Memory: 50GB

 - guest(only booting one guest)
   kernel: 3.4.0+ (Fedora16)
   CPU: 1VCPU(dedicated)
   Memory: 2GB

3 patterns based on the bare environment were indicated as follows:
	                Scores      overhead against [0] Bare
	 [0] Bare      29043600                -
	 [1] IVRing    28565398              1.6[%]
	 [2] NFS       22000508             24.3[%]
	 [3] SSH       10246792             64.7[%]
The overhead of IVRing is much lower than other methods using network. This
because the IVRing method only records trace data to a ring-buffer. On the
other hand, other methods read trace data from a SystemTap buffer to the
userland and send the data to a host via network. Therefore, a method of
IVRing minimizes the overhead of transporting trace data from a guest to a

***How to use***
Here, how to use IVRing and IVRing-reader is simply given.

1. Prepare any distribution including qemu-kvm binary after 0.13.0 version.
 IVShmem was pushed on qemu-kvm mainline after 0.13.0 version.
 Latest Fedora or Ubuntsu are available.

2. Boot a guest installed IVRing driver with device option.
 A device option is needed as follows:
	-device ivshmem,size=,shm=
shm_obj, shared memory object path, is used later to share the memory
with the reader on a host. For example, a device option is like below:
	-device ivshmem,size=2,shm=/ivshmem
 IVShmem supports interrupts mode using ivshmem_server and this IVRing
driver is
implemented as usable for doorbelling to the reader as a experimental
This feature will be used near the future.

3. Run IVRing-reader on a host.
 To share the memory region with IVShmem, s option for indicating shm_obj
is same as the second step is needed like below:
	./ivring_reader -m 2 -f /tmp/log.txt -S 10 -N 2 -s /ivshmem
Each options are indicated 2nd patch in detail.
Then, IVRing-reader starts to read data from IVRing, but the ring-buffer is
empty yet.
	shared object size: 2097152 (bytes)
	Ring header is already initialized
	reader -1, writer 0, pos 20074a9f
	ivring_init_hdr: 0x7f128417d000
	Receive an interrupt 2
	Try to read buffer.
	Receive an interrupt 2
	no data
	__ivring_read ret=0
	Try to read buffer.
	no data
	__ivring_read ret=0
	Try to read buffer.

4. Start to record logging or tracing data on a guest.
 API for kernel programing is available for IVRing driver:
	ivring_write(int ID, void *buf, size_t size).

It is used for kernel logging as follows:

 	int len;
	char buf[1024];
	len = sprintf(buf, "hogehoge\n",... )
	ivring_write(0, buf, len);

When SystemTap is used as a tracer, a sample script is as follows:

	extern int ivring_write(int id, void *buf, size_t size);

	function ivring_print(str:string) %{
		ivring_write(0, THIS->str, strlen(THIS->str));

	probe kernel.trace("sched*") {
		ivring_print(sprintf("%u: %s(%s)\n", gettimeofday(), pn(), $$parms))
The script is executed as
	stap -vg ivring_writer_sample.stp.

 When it is success to record data to IVRing, reader outputs as follows:
	Try to read buffer.
	__ivring_read ret=4096
	__ivring_read ret=4096
	__ivring_read ret=313
	Try to read buffer.
	__ivring_read ret=4096
	__ivring_read ret=4096
	__ivring_read ret=632
	Try to read buffer.

***Future Work***
Features below will be implemented as future work:
 1. To implement a feature of notification from a guest to a host
 2. To implement user I/F on a guest
 3. To be usable in tracing system existing in-kernel
 4. To be usable in SMP environment
    (lockless ring-buffer like ftrace, one ring-buffer one CPU)
 5. To design for Live Migration

Thank you,


Yoshihiro YUNOMAE (2):
      ivring: Add a ring-buffer reader tool
      ivring: Add a ring-buffer driver on IVShmem

 drivers/Kconfig               |    1 
 drivers/Makefile              |    1 
 drivers/ivshmem/Kconfig       |    9 +
 drivers/ivshmem/Makefile      |    5 
 drivers/ivshmem/ivring.c      |  551
 drivers/ivshmem/ivring.h      |   77 ++++++
 tools/Makefile                |    1 
 tools/ivshmem/Makefile        |   19 +
 tools/ivshmem/ivring_reader.c |  516
 tools/ivshmem/ivring_reader.h |   15 +
 tools/ivshmem/pr_msg.c        |  125 +++++++++
 tools/ivshmem/pr_msg.h        |   19 +
 12 files changed, 1339 insertions(+), 0 deletions(-)
 create mode 100644 drivers/ivshmem/Kconfig
 create mode 100644 drivers/ivshmem/Makefile
 create mode 100644 drivers/ivshmem/ivring.c
 create mode 100644 drivers/ivshmem/ivring.h
 create mode 100644 tools/ivshmem/Makefile
 create mode 100644 tools/ivshmem/ivring_reader.c
 create mode 100644 tools/ivshmem/ivring_reader.h
 create mode 100644 tools/ivshmem/pr_msg.c
 create mode 100644 tools/ivshmem/pr_msg.h

Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
CD: 3ms