Notes on Tensorflow and how it was used in ADTLib

Its been almost 2 years since I have been an amateur drummer (which apparently is also the time since my last blog) and I have always felt that it would be great to have something that can provide me with drum transcriptions from a given music source. I researched a bit and came across a library that provides an executable as well as an API that can be used to generate drum tabs (consisting of hi-hats, snare and the kick drum) from a music source. Its called ADTLib. This isn’t extremely accurate when one tests it and I’m sure the library will only get better with more data sources available to train the neural networks better but this was a definitely a good place to learn a bit about neural networks and libraries like Tensorflow. This blog post is basically meant serve as my personal notes wrt how Tensorflow has been used in ADTLib.

So to start off, the ADTLib source code actually doesn’t train any neural networks. What ADTLib essentially does is feed a music file through a pre-trained neural network to give us the automatic drum transcriptions in text as well as PDF form. We will need to start off by looking at two methods and one function inside https://github.com/CarlSouthall/ADTLib/blob/master/ADTLib/utils/__init__.py

  • methods create() and implement() belonging to class SA
  • function system_restore()

In system_restore() we initialise an instance of SA and call the create() method. There are a lot of parameters that are initialised when we create the neural network graph. We’ll not go into the details of those. Instead let’s look at how Tensorflow is used inside the SA.create() method. I would recommend reading this article on getting started with Tensorflow before going ahead with the next part of this blog post.

If you’ve actually gone through that article you’d know by now that Tensorflow creates graphs that implement a series of Tensorflow operations. The operations flow through the units of the graphs called ‘tensors’ and hence the name ‘tensorflow’. Great. So, getting back to the create() method, we find that first a tf.reset_default_graph() is called. This resets the global variables of the graph and clears the default graph stack.

Next we call a method weight_bias_init(). As the name suggests this method initialises the weights and biases for our model. In a neural network, weights and biases are parameters which can be trained so that the neural network outputs values that are closest to the target output. We can use ‘variables’ to initialise these trainable parameters in Tensorflow. Take these examples from the weight_bias_init() code:

  • self.biases = tf.Variable(tf.zeros([self.n_classes]))
  • self.weights =tf.Variable(tf.random_normal([self.n_hidden[(len(self.n_hidden)-1)]*2, self.n_classes]))

self.biases is set to variable with an initial value which is defined by the tensor returned by tf.zeros() (returns a tensor with dimension=2 where all elements are set to 0. self.n_classes is set to 2 in the ADTLib code). self.weights is initialised to a variable defined by the tensor returned by tf.random_normal(). tf.random_normal() returns a tensor of the mentioned shape with random normal values (type float32) with a mean of 0.0 and a standard deviation of 1.0. These weights and biases are trained based on the type of optimisation function later on. In ADTLib no training is actually done wrt to these weights and biases. These parameters are loaded from pre-trained neural networks as I’ve mentioned before. However we need these tensors defined in order to be able to implement the neural network on input music source.

Next we initialise a few ‘placeholders’ and ‘constants’. Placeholders and constants are again ‘tensors’ and resemble units of the graph. Example lines from the code:

  • self.biases = tf.Variable(tf.zeros([self.n_classes]))
  • self.seq=tf.constant(self.truncated,shape=[1])

Placeholders are used when a graph needs to be provided external inputs. They can be provided values later on. In the above example we define a placeholder that is supposed to hold ‘float32’ values in an array of dimension [1, 1000, 1024]. (Don’t worry about how I arrived at these dimensions. Basically if you check the init() method for class SA, you’ll understand that ‘self.batch’ is structure of dimension [1000, 1024]). Constants as the name suggests hold constant values. In the above example, self.truncated is initialised to 1000. ‘shape’ is an optional paramaters that specifies the dimension of the resulting tensor. Here the dimension is set to [1].

Now, ADTLib uses a special type of recurrent neural networks called bidirectional recurrent neural networks (BRNN). Here neurons or cells of a regular RNN are split into two directions, one for positive time direction(forward states), and another for negative time direction(backward states). Inside the create() method, we come across the following code:
self.outputs, self.states= tf.nn.bidirectional_dynamic_rnn(self.fw_cell,
self.bw_cell, self.x_ph,sequence_length=self.seq,dtype=tf.float32)

This creates the BRNN with the two types of cells provided as parameters, the input training data, the length of the sequence (which is 1000 in this case) and the data type. self.outputs is a tuple (output_fw, output_bw) containing the forward and the backward RNN output Tensor.

The forward and backward outputs are concatenated and fed to the second layer of the BRNN as follows:

self.first_out=tf.concat((self.outputs[0],self.outputs[1]),2)
self.outputs2, self.states2= tf.nn.bidirectional_dynamic_rnn(self.fw_cell2,
self.bw_cell2,self.first_out,sequence_length=self.seq2,dtype=tf.float32)

We now have the graph that defines how the BRNN should behave. These next few lines of code in the create() method deals with something called as soft-attention. This answer on stack overflow provides an easy introduction to this concept. Check it out if you want to but I’ll not go much into those details. But what happens essentially is that the forward and backward output cells from the second layer are again concatenated and then furthur processed to ultimately get a self.presoft value which resembles (W*x+b) as seen below.

self.zero_pad_second_out=tf.pad(tf.squeeze(self.second_out),[[self.attention_number,self.attention_number],[0,0]])
self.attention_m=[tf.tanh(tf.matmul(tf.concat((self.zero_pad_second_out[j:j+self.batch_size],tf.squeeze(self.first_out)),1),self.attention_weights[j])) for j in range((self.attention_number*2)+1)]
self.attention_s=tf.nn.softmax(tf.stack([tf.matmul(self.attention_m[i],self.sm_attention_weights[i]) for i in range(self.attention_number*2+1)]),0)
self.attention_z=tf.reduce_sum([self.attention_s[i]*self.zero_pad_second_out[i:self.batch_size+i] for i in range(self.attention_number*2+1)],0)
self.presoft=tf.matmul(self.attention_z,self.weights)+self.biases

Next we come across self.pred=tf.nn.softmax(self.presoft). This basically decides what activation function to use for the output layer. In this case softmax activation function is used. IMO this is a good reference for different kind of activation functions.

We now move onto the SA.implement() method. This function takes an input audio data, processed by madmom to create a spectrogram. Next self.saver.restore(sess, self.save_location+'/'+self.filename) loads the respective parameters from pre-trained neural network files for respective sounds (hi-hat/snare/kick). These Tensorflow save files can be found under ADTLib/files. Once the parameters are loaded, the Tensorflow graph is executed using sess.run() as following:
self.test_out.append(sess.run(self.pred, feed_dict={self.x_ph: np.expand_dims(self.batch,0),self.dropout_ph:1}))

When this function is executed we get the test results and further processing is done (this process is called peak-picking) to get the onsets data for the different percussive components.

I guess that’s it. There are a lot of details that I have omitted from this blog, mostly because it would make the blog way longer. I’d like to thank the author of ADTLib (Carl Southall) who cleared some icky doubts I had wrt to the ADTLib code. There is also a web version of ADTLib that has been developed with an aim to gather more data to train the networks better. So contribute data if you can!

Advertisements

Travelling to FUDCon, Pune 2015

Right now I’m at Kempegowda International Airport, Bengaluru looking forward to spending an exciting weekend at FUDCon, Pune. It seems my flight is delayed by almost an hour and this is a good time to have a look at the schedule more closely. Why am I looking forward to an ‘exciting’ weekend at Pune? Well, just have a look at the schedule and you’ll understand why. The schedule is packed with a wide range of talks and workshops on really interesting topics.

For a conference having such an interesting schedule its important to prepare a schedule of your own to try and take back as much as possible. This is a screenshot of my schedule for the next 3days. Oh! And I haven’t mentioned that I’m looking forward to FUDPub too that is scheduled on Saturday evening IIRC. 😉

My FUDCon Pune schedule

I myself would be talking on kernel ABI breakages along with an introduction to how its stability is tracked through genksyms currently and how an alternative could be possible through spartakus. That would be tomorrow after at 12:30 pm. I really hope my talk can live up to the standards that will be set by other speakers.

I guess its time to check-in for my flight now. Also the free WiFi is almost timed out. So see you at FUDCon, Pune 2015.

spartakus: Using sparse to have semantic checks for kernel ABI breakages

I have been working on this project I have named spartakus that deals with kernel ABI checks through semantic processing of the kernel source code. I have made the source code available on Github some time back and this does deserve a blog post.

Introduction
spartakus is a tool that can be used to generate checksums for exported kernel symbols through semantic processing of the source code using sparse. These checksums would constitute the basis for kernel ABI checks, with changes in checksums meaning a change in the kABI.

spartakus (which is currently a WIP) is forked from sparse and has been modified to fit the requirements of semantic processing of the kernel source for kernel ABI checks. This adds a new binary ‘check_kabi‘ upon compilation, which can be used during the linux kernel build process to generate the checksums for all the exported symbols. These checksums are stored in Module.symvers file which is generated during the build process if the variable CONFIG_MODVERSIONS is set in the .config file.

What purpose does spartakus serve?
In an earlier post I had spoken a bit on exported symbols constituting the kernel ABI and how its stability can be kept track of through CRC checksums. genksyms has been the tool that had been doing this job of generating the checksums for exported symbols that constitute the kernel ABI so far, but it has several problems/limitations that make the job of developers difficult wrt maintaining the stability of the kernel ABI. Some of these limitations as determined by me are mentioned below:

1.genksyms generates different checksums for structs/unions for declarations which are semantically similar. For example, for the following 2 declarations for the struct ‘list_head’:

struct list_head {
    struct list_head *next, *prev;
};

struct list_head {
    struct list_head *next;
    struct list_head *prev;
};

both declarations are essentially the same and should not result in a change in kABI wrt ‘list_head’. Sparse treats these 2 declarations as semantically the same and different checksums are not generated.

2. For variable declarations with just unsigned/signed specification with no type specified, sparse considers the type as int by default.

Hence,
‘unsigned foo’ is converted to ‘unsigned int foo’ and then processed to get the corresponding checksum. On the other hand, genksyms would generate different checksums for 2 different definitions which are semantically the same.

License
sparse is licensed under the MIT license which is GPL compatible. The added files as mentioned above are under a GPLv2 license since I have used code from genksyms which is licensed under GPLv2.

Source Code
Development on spartakus is in progress and the corresponding source code is hosted on Github.

Lastly, I know there will be at least who will wonder why the name ‘spartakus’. Well I do not totally remember, but it had something to do with sparse and Spartacus. Does sound cool though?

10 years of DGPLUG

I was in NIT Durgapur, West Bengal (my home away from home until a year and a few months back) to attend an event celebrating 10 years of DGPLUG. This was held in the college campus from 29th August to 2nd September. Not considering the fact that being back in West Bengal for that short interval was a beautiful experience for me, I have returned back to Pune with a bag full of memories from the event, which I believe was an amazing success. The goal of the event was to celebrate a decade of the DGPLUG community while holding talks and workshops to promote contributions to Free and Open Source software from the region. All thanks to Red Hat and the TEQIP cell NIT Durgapur for making this possible by providing the necessary funds for travel and accommodation. I’ll try and provide a day-by-day account of the event as I remember:

Day 0: We (me, sayan, chandankumar, praveenkumar, rtnpro and pjp) reached Durgapur around 5:40 pm and were settled in the hotel in an hour. We took some rest, while waiting for Kushal to arrive after which we had a nice dinner at a local restaurant, called Layeks. We discussed our plans for the next day during and after the dinner. All we could do after a day of journey and a fulfilling dinner was to sleep off. We needed to start early next day.

Day 1: After the initial formalities, the event started off with Kushal initiating the talk on the history of DGPLUG. I’m pretty sure the audience really loved that talk. Next there was some story-telling by the DGPLUG members where they spoke of how they were introduced to FOSS contributions, the DGPLUG community and how DGPLUG played an important role in each of our lives.

The audience was introduced to the DGPLUG summer training program that is organised each year following which the new members of DGPLUG were called in on stage to introduce themselves and speak on how the DGPLUG summer training helped.

After this Kushal demoed IRC live on the big screen which provided the audience with enough entertainment to get them charged up for the technical talk by Prasad J. Pandit (P J P) on iptables. His talk touched on concepts of networking and invited a lot of questions from the audience.

Prasad Pandit

P J P starting his talk on iptables

The day ended with pjp’s talk and we headed off to our hotels to freshen up for a team dinner.

Day 2: This day started with a well-promoted Python workshop, which was evident from the overwhelming participation. The D. M. Sen Auditorium Hall was packed with no seats left.

Full House 2

Fully packed auditorium for the Python workshop

The participants took off to Python really well, and there were comparatively a lot few mistakes from them even though the workshop pace was not too slow. During the course of the workshop, the students were introduced to Vi/Vim and almost all of them used this for editing purposes. We realised that the workshop was a real success when we found the auditorium hall filled to capacity even after an 1-hour break for lunch.

Day 3: Workshops were help on web development using Flask and testing in Python. I did not have much to contribute to the workshop on Flask and instead gave some time to my personal agenda which was giving time to the KF5 porting work for KStars that I had been postponding for long.

Day 4: This was the last day of the event for me. The day started with Kushal Das giving a short talk on the importance of providing documentation for the code we write. This provided the necessary build up to the workshop introducing the participants to reStructuredText format or RST. By lunch time, the participants had a grasp of the RST format. After lunch, Kushal started off with his workshop on documenting Python using Sphinx.

Over these past 4 days, I interacted with 2 guys, Sourav Moitra and Raunak Pillai, both of whom showed a lot of interest in getting started with contributing to KDE. Sourav Moitra was interested in Astronomy and KStars interested him. He got KDE installed on his Fedora 20 system and I helped him build and install KStars from source. He also setup KDevelop with KStars and learned how to to use KDevelop as an IDE for development purposes. He went through the source and asked question to clear off doubts.

The event managed to retain a very healthy crowd each day and I consider that an important parameter to measure the success level for an event. On top of that we received feedback from the participants which made it clear that they wanted more such events being organised in NIT Durgapur and surrounding locations.

P.S. : Akademy 2014 is about to start off and I feel so sad not being able to attend it. Anyway, wishing Akademy 2014 all the best and hope all the attendees have an awesome time!

Starving Developers

Please try and donate to make the Randa Sprint possible this year. 🙂

やった

Please help!

KDE‘s Randa Meetings is quite possibly the best large-scale developer sprint ever. And you can help make it happen.

Imagine some 40 developers cramped together in a house in the middle of the Alps, living of code alone. Nothing could possibly go wrong….

Phonon, a pillar of our multimedia solutions, was revived in Randa. Kdenlive, our video editor, became 302% more awesome in Randa. The KDE Frameworks 5 movement seeking to make our awesome libraries more useful to all the world started in Randa. Amarok 2 was planned in Randa. Approximately a godzillion bugs were fixed in Randa.

Donate to the Randa Meetings 2014 now and get more free awesome! I personally have set my heart on making multimedia more awesome with easier APIs for developers and UPnP streaming support, also other stuff ( maybe a wallpaper that shows kittens wearing hats? #kdekittenhat)

View original post

Not so easy way to know about your Kernel ABI

My work in Red Hat is related to the Linux kernel ABI (or kABI). This has been an entirely new and exciting experience for me since I became a Red Hat associate in May last year. In this blog post and some posts subsequent to this I would like to talk about everything I have been learning so far.

First up, ABI or the Application Binary Interface is a low level (binary level) interface exposed by some application. The linux kernel and its accompanying modules also expose (or exports) a binary interface which we refer to as the kernel ABI or kABI. It is important that this exported interface remains stable because modules written for a particular kABI would need to be recompiled if the interface does change in a subsequent kernel version.

The exported binary interface that I keep referring to is nothing except a list of symbols which have been decided to be exported. These symbols could be function names, enums, structs or anything. To find these exported symbols, you would just need to ‘grep’ through the linux source for “EXPORT_SYMBOL”. Following is a section of the grep output when I search for EXPORT_SYMBOL in the source code for kernel-3.14.5-200.fc20 built for Fedora:

sbairagy@dhcp193-120 ~/r/k/linux-3.14> grep -Rn "EXPORT_SYMBOL" .
./kernel/async.c:303:EXPORT_SYMBOL_GPL(async_synchronize_cookie_domain);
./kernel/async.c:316:EXPORT_SYMBOL_GPL(async_synchronize_cookie);
./kernel/gcov/base.c:52:EXPORT_SYMBOL(__gcov_init);
./kernel/gcov/base.c:62:EXPORT_SYMBOL(__gcov_flush);
./kernel/gcov/base.c:68:EXPORT_SYMBOL(__gcov_merge_add);
./kernel/gcov/base.c:74:EXPORT_SYMBOL(__gcov_merge_single);
./kernel/gcov/base.c:80:EXPORT_SYMBOL(__gcov_merge_delta);
./kernel/gcov/base.c:86:EXPORT_SYMBOL(__gcov_merge_ior);
./kernel/tracepoint.c:395:EXPORT_SYMBOL_GPL(tracepoint_probe_register);
./kernel/tracepoint.c:439:EXPORT_SYMBOL_GPL(tracepoint_probe_unregister);

All symbols enclosed in brackets after EXPORT_SYMBOL or EXPORT_SYMBOL_GPL are exported as a part of the binary interface exported by the kernel. When the kernel is built for Fedora, these exported symbols are stored in a file called Module.symvers. You can find this file for the kernel version you are running in /usr/src/kernels/<version>/. A section of the Module.symvers file for the kernel I am using is shown below:

sbairagy@dhcp193-120 ~/r/k/linux-3.14> vim /usr/src/kernels/3.14.8-200.fc20.x86_64/Module.symvers
0x00000000      tveeprom_read   drivers/media/common/tveeprom   EXPORT_SYMBOL
0x00000000      cx2341x_log_status      drivers/media/common/cx2341x    EXPORT_SYMBOL
0x00000000      ttm_bo_acc_size drivers/gpu/drm/ttm/ttm EXPORT_SYMBOL
0x00000000      inet_sendmsg    vmlinux EXPORT_SYMBOL
0x00000000      zs_get_total_size_bytes vmlinux EXPORT_SYMBOL_GPL

The 0th column contain checksums corresponding to the exported symbols which one can find in column 1. You might notice that the checksums for all the symbols are 0x00000000. This is not odd because some variable called CONFIG_MODVERSIONS had not been set during the kernel build process thus disabling generation of checksums for the exported symbols.

These checksums are vital to catching kABI breakages. kABI breakage occurs whenever changes are made to exported symbols. As a simple example, if we have a exported symbol ‘colours’ which refers to an enum ‘colours’ defined as

enum colours {"Red", "Green", "Blue"}

and if we decide to change this enum by adding one more colour (say “Yellow”) to it, then we say that the change in the exported symbol ‘colours’ have resulted in a change in the kernel ABI. Now whenever such changes happen for any symbol, the checksum corresponding to that symbol also change which leads to easy detection of kernel ABI change (or breakage).

How are these checksums generated? Well, there is a tool called genksyms which is integrated with the Linux kernel source and does this job of generating checksums for exported symbols during build process, if CONFIG_MODVERSIONS variable is set. In my subsequent posts I will write about the genksyms tool and how it actually generates the all important checksums. Till then, cheers. 😉

How to get an IRC bouncer with ZNC for KDE developers

I had been postponing setting up an IRC bouncer for a long time inspite of its benefits, until recently, when I actually went through with the really easy process even for lazy people like me. This is all you really need to do:

  1. Go to https://sysadmin.kde.org/tickets/ and login using your identity.kde.org username and password (Here I am assuming you already have a KDE identity account)
  2. Go to ‘Submit A Ticket’ and select ‘IRC’ from the list of options available.
  3. File a bug with appropriate information for ‘Subject’, ‘Priority’ and ‘Freenode Nickname’ and provide some other relevant information in the text field.
  4. Once this ticket is closed by KDE sysadmin, which might take 1-2 days, you will receive a login id and password through email.
  5. Open https://bnc.kde.org:7778/ and login using the above login id and password.
  6. In ‘Your Settings’ you can configure your account which includes changing the password set by sysadmin, adding your Freenode network and then adding the channels you want to stay connected to.
  7. Just remember that when you add servers to your network which is Freenode, the format is ‘One server per line, host [[+]port] [password]’. For example:
    irc.freenode.net 6667 <your_password>
  8. Once you are done with this you need to configure your IRC client. For Konversation which is the client I use for IRC, all you really need to do is follow the steps here: http://community.kde.org/Sysadmin/BNC#Settings_for_Konversation. However, do note that ‘yourusername’ == <your_kde_identity_username>/Freenode.

That’s all that you need to do! Also this is the wiki page that talks about setting this up for other IRC clients: http://community.kde.org/Sysadmin/BNC. So go ahead and setup your IRC bouncer; and you can always see what happened on a channel even when you are not connected.

Cheers!