K ERNEL D EVELOPMENT CSC585 Class Project Dawn Nelson December 2009.

KERNEL DEVELOPMENT

CSC585 Class ProjectDawn Nelson

December 2009

COMPARE TIMING AND JITTER BETWEEN A REALTIME MODULE

AND NON-REALTIME MODULE Are the results of using a realtime module worth the

effort of installing RTAI? What is the timing difference between realtime and

non-realtime kernel modules for computation? What is the jitter difference between realtime and

non-realtime kernel modules for computation? What is the jitter difference between realtime and

non-realtime kernel modules for overall process time, with and without MPI?

What types of tasks are improved by using RTAI?

WHAT IS THE TIMING DIFFERENCE BETWEEN REALTIME AND NON-REALTIME KERNEL MODULES?

OVERALL PROCESS TIME COMPARISON FOR 8 NODES

WHAT IS THE JITTER DIFFERENCE BETWEEN REALTIME AND NON-REALTIME KERNEL

MODULES FOR OVERALL PROCESS TIME?

SOURCE CODE WRITTEN

Kernel Module implementing a char device read/write as a signal to perform the kernel task.

Kernel Module implementing RTAI with a fifo and a semaphore as a signal to perform the kernel task.

Programs to use the kernel modules. MPI Programs to use the kernel modules. Scripts to build and load both modules. Scripts to run programs and save results. Scripts to initiate MPI on all nodes (because

mpdboot is retarded and doesn’t work for 8 nodes)

CHARACTER DEVICE DRIVER – READ FUNCTION

///read

ssize_t mmmodule_mmmdo(struct file *filp, char *buf,size_t count, loff_t *f_pos) {

int a[20][20],b[20][20],c[20][20];

int i,j,k,extraloop,t2;

RTIME t0, t1;

t0 = rt_get_cpu_time_ns();

//50000 iterations for a good measurement

for (extraloop=0; extraloop< 50000; extraloop++) {

// Matrix calculation block

for (k=0; k< 20; k++)

for (i=0; i< 20; i++) {

c[i][k] = 0;

for (j=0; j< 20; j++)

c[i][k] = c[i][k] + a[i][j] * b[j][k]; } }

t1 = rt_get_cpu_time_ns();

t2 = (int) (t1-t0);

// Changing reading position as best suits

//copy_to_user(buf,mmmodule_buffer,1);

return t2; }

CHARACTER DEVICE DRIVER - SETUP// memory character device driver to do matrix

// multiply upon a call to it

#include <linux/init.h>

#include <linux/module.h>

#include <linux/kernel.h> // printk()

#include <linux/slab.h> // kmalloc()

#include <linux/fs.h> // everything

#include <linux/errno.h> // error codes

#include <linux/types.h> // size_t

#include <linux/proc_fs.h>

#include <linux/fcntl.h> // O_ACCMODE

#include <asm/system.h> // cli(), *_flags

#include <rtai_sched.h>

MODULE_LICENSE("GPL");

// Declaration of mmmodule.c functions

int mmmodule_open(struct inode *inode, struct file *filp);

int mmmodule_release(struct inode *inode, struct file *filp);

ssize_t mmmodule_mmmdo(struct file *filp, char *buf, size_t count, loff_t *f_pos);

void mmmodule_exit(void);

int mmmodule_init(void);

/* Structure that declares the usual file */

/* access functions */

struct file_operations mmmodule_fops = {

read: mmmodule_mmmdo,

//write: mmmodule_write,

open: mmmodule_open,

release: mmmodule_release

};

// Declaration of the init and exit functions module_init(mmmodule_init);module_exit(mmmodule_exit);// Global variables of the driver int mmmodule_major = 60; // Major number char *mmmodule_buffer; // Buffer to store data int mmmodule_init(void) { int result; // Registering device result = register_chrdev(mmmodule_major, "mmmodule", &mmmodule_fops);if (result < 0) { printk("mmmodule: cannot get major number %d\n",

mmmodule_major); return result;}// Allocating mmmodule for the buffer mmmodule_buffer = kmalloc(1, GFP_KERNEL);if (!mmmodule_buffer) { result = -ENOMEM; goto fail;}memset(mmmodule_buffer, 0, 1);printk("Inserting mmmodule module\n");return 0;fail:mmmodule_exit();return result;}

REALTIME MODULE - READstatic int myfifo_handler(unsigned int fifo){

rt_sem_signal(&myfifo_sem);return 0;

}

static void Myfifo_Read(long t){ int i=0,j=0,k=0,xj=0; int a[20][20],b[20][20],c[20][20]; char ch ='d'; RTIME t0, t1;

while (1) {//rt_printk("new_shm: sem_waiting\n");rt_sem_wait(&myfifo_sem);rtf_get(Myfifo, &ch, 1);//rt_printk("got a char off the fifo... time to do matrix mult\n");t0 = rt_get_cpu_time_ns();//rt_printk("t0= %ld \n",t0);for (xj=0; xj < 50000; xj++) {

for (k=0; k < 20; k++)for (i=0; i < 20; i++) {

c[i][k] = 0;for (j=0; j< 20; j++)c[i][k] = c[i][k] + a[i][j] * b[j][k];

} }t1 = rt_get_cpu_time_ns();shm->t2 = t1-t0; // = (int *)t2;

}}

REALTIME MODULE - SETUP

static RT_TASK read;#define TICK_PERIOD 100000LL /* 0.1 msec ( 1 tick) */

int init_module (void){

// shared memory sectionrt_printk("shm_rt.ko initialized: tick period = %ld\n",

TICK_PERIOD);shm = (mtime *)rtai_kmalloc(nam2num(SHMNAM), SHMSIZ);if (shm == NULL)

return -ENOMEM;memset(shm, 0, SHMSIZ);

rtf_create(Myfifo, 1000);rtf_create_handler(Myfifo, myfifo_handler);rt_sem_init(&sync, 0);rt_typed_sem_init(&myfifo_sem, 0, SEM_TYPE);rt_task_init(&read, Myfifo_Read, 0, 2000, 0, 0, 0);

start_rt_timer((int)nano2count(TICK_PERIOD));rt_task_resume(&read);return 0;

}

CONCLUSIONS

There are cases when RTAI improves timing and jitter. Mostly, longer running tasks, widely distributed tasks and deterministic tasks.

Accessing shared memory created using RTAI sadly slows the module to a ‘crawl’. My previous rt-module was giving results of 140 milliseconds per 5000 matrix multiplies. New version gives results of 100 Nanoseconds for 50,000 matrix multiplies. I can try physical memory mapping to see if performance is improved.

I don’t think modules were meant to be used for mass amounts of data, because of the slow transfer between user & kernel via copy-to-user, shared memory and copy-from-user

For MPI, the main advantage of using RTAI is that the nodes all finish at nearly the same rate.

LESSONS LEARNED A kernel crash writes core dumps on all open

windows. A small tick-period locks up the whole machine and is

unrecoverable. Fifos and semaphores work nicely and do not create

race conditions. Character device drivers work nicely but are a little

more maintenance to set up and program. These are my first modules ever written, including the

rt one for the conference. A profiler would be very useful for comparing

performance instead of graphs and text. I will soon be writing an RT module to read a synchro

device every 12 milliseconds to try out the deterministic-ness of RTAI.

1000 Nanoseconds = 1 Micosecond 1 Microsecond = 1000 Millisecond 1 Millisecond = 1000000 Nanoseconds

FUTURE WORK

There is very little work or code examples (findable by Google, anyway) done with RTAI

The Matrix Multiply, even at 50 thousands iterations, is not cpu-intensive enough to prove or disprove the advantages of RTAI.

Need to ask the Physicists for some of their algorithms to crunch through the system. At the conference, it was the physicists who showed interest in RTAI.

Plan to use RTAI for its intended purpose of being deterministic.

Write stuff about things for a paper.

C107 8 NODE CLUSTER SETUP WITH CENTOS 5.3, RTAI AND MPICH2

K ERNEL D EVELOPMENT CSC585 Class Project Dawn Nelson December 2009.

Documents

Transcript of K ERNEL D EVELOPMENT CSC585 Class Project Dawn Nelson December 2009.