My Quest for Free Licensed Japanese Pop Music with Wacca.Fm's XMLRPC API and Python's xmlrpclib

My Quest for Free Licensed Japanese Pop Music with Wacca.Fm's XMLRPC API and Python's xmlrpclib


This is my story about searching for Japanese pop music under a free culture license. It's a little tricky, because the best sites for this are of course, in Japan, and not well advertised on the English web. I discovered how to use Python's XMLRPC library to run searches using the web API for a Japanese music sharing site called "Wacca". The results were very interesting -- I found some of what I was looking for, though not all.

Making Movies with Free Software

This article is part of an on-going series on the challenges I've faced in producing two free-licensed movies, Marya Morevna, through the Morevna Project and Lunatics, which we are working on as Anansi Spaceworks.

It all started when I was looking for character theme music on Jamendo for Lunatics. One of the characters is Japanese, and while she probably doesn't listen to J-Pop or Enka much herself (she's more of a classical piano type), we felt there might be some times when the cultural reference would be appropriate.

Clearly what I needed was to find a Japanese music sharing site

Well, as it turns out, Jamendo's selection of Japanese popular music is poor. I found three actual J-Pop albums under By-NC-SA licenses (which I can't use as is, and, considering the state of my Japanese, negotiating for a re-licensed track might be beyond my capabilities). In fact there is only one "single" track ("Kotoba Wa Sonomama") of J-Pop under a By-SA license, by "Hanadix" (though it's catchy, I admit). I suppose this is not so surprising, Jamendo is a French website. Their collection of American Country and Western music isn't that impressive either (perhaps I'll tackle in another column).

Finding a Japanese Music Site

Of course, I also did some general websearches on Enka and Creative Commons licenses (By and By-SA, specifically). I found nothing this way, and of course, lots of useless links -- it's not a narrow enough way to search, because even when these terms occur together on a page, it doesn't mean you found a music download under that license. Of course, I had to do these searches in English and Japanese. For the record, "Enka" is spelled in Japanese, "演歌", Creative Commons is "クリエイティブコモンズ" (that's just a transliteration of the English into katakana characters) and Attribution-ShareAlike is "表示 - 継承".

A few terms of Japanese used in this column -- just in case your browser doesn't render the Japanese text correctlyA few terms of Japanese used in this column -- just in case your browser doesn't render the Japanese text correctly

I think the best description of Enka that I can provide is that it is the Japanese equivalent of Country and Western music ("The music of pain" as Joss Whedon once described it). Here's a downloadable example (though it's By-NC licensed). It serves much the same emotional purpose in Japan: nostalgia for a past way of life, loves lost, and so on. If you sing a "somebody-did-somebody-wrong" song in Japanese, it'll probably be Enka. It's frequently found its way into Anime as comic exaggeration of a characters' miserable circumstances. Occasionally, it even gets used without irony.

Wacca.fm is a Japanese music sharing site. It's similar in concept to Jamendo, but obviously it has a lot more Japanese musicWacca.fm is a Japanese music sharing site. It's similar in concept to Jamendo, but obviously it has a lot more Japanese music

Clearly what I needed was to find a Japanese music sharing site, that would be much more likely to have a wider variety of less-westernized music. After doing a number of searches for meta-sources (by which I mean sources that list sources), I did finally find on: Wacca. It was a challenge to navigate since all of the buttons are labeled in Japanese (there is no "English" button). There are a number of tricks for navigating such sites. I should probably mention a few:

  • Pay attention to your browser's status bar as you mouse-over buttons, URLs often use English words, even when the site language is different
  • You can always give the URL to Yahoo's "Babelfish" translation service. This will machine-translate the text on the page (though not the graphical buttons)
  • Put yourself in the shoes of the designer -- where would you put the button you need to find?
  • Be sure to check the original page, especially if you can read the foreign language at all. Sometimes Babelfish makes unreadable hash out of the text. It's only a machine.

It was a challenge to navigate since all of the buttons are labeled in Japanese

This part was really fun. For one thing, I finally did find Enka and a variety of other styles of Japanese music available for download (which is progress -- I'd never found it in my previous attempts). But indeed, the great majority of the music is under non-commercial or no-derivatives licenses. It's a very inefficient strategy to search for the music you want and only then read the license. So I needed to be able to search by license (I described how to do this at Jamendo in a previous column). However, Wacca is not so convenient -- it's possible to limit a search to "Creative Commons only", but there's no way to limit it to the much-less-used free licenses (it's clear to me that Wacca, and probably the Japanese web in general, is a few years behind the West in adopting free culture licenses; which is not terribly surprising, given that the Creative Commons is an American/English-language organization).

I found this Enka track with just a few minutes of searching using the web interface, but there was no obvious way to limit the search to specific CC licensesI found this Enka track with just a few minutes of searching using the web interface, but there was no obvious way to limit the search to specific CC licenses

It is possible to search the site using Google, using the Creative Commons license name in Japanese, which does occur on the song's page (unlike Jamendo, uses a per-song licensing model (rather than per-album). Wacca's approach actually makes more sense to me here).

Using the Web API

However, I did notice a little button on the help page that says APIについて ("about the API"). Ooh. API. I can work with that. Using Babelfish to translate the API documentation pages into English (and referencing the originals), I was able (after a few false starts) to work out how to use Wacca's XMLRPC API to do the exact searches I wanted. Now we're cookin'!

You can use Yahoo's Babelfish service to machine-translate a web page if you need toYou can use Yahoo's Babelfish service to machine-translate a web page if you need to

Now of course, I always find it a little daunting when I realize I'm going to have to write a program to do what I want. But in this case, it's not so bad

Now of course, I always find it a little daunting when I realize I'm going to have to write a program to do what I want. But in this case, it's not so bad. All I need is a little ad hoc script. In fact, I really could (and in fact, did) do all of this stuff in the Python interactive interpreter. I'm going to keep this dead simple, because if I start trying to trick everything up to make it generally usable, I'll just get bogged down and never finish it. So here's what I did, and if you want to make a general tool from it, be my guest.

Top page of the API documentation -- original on the left, and Babelfish translation into English on the rightTop page of the API documentation -- original on the left, and Babelfish translation into English on the right

As is common around the world, the API itself uses English words. I'm not sure about the politics of this, or how other people feel about it, but I have to say that as a native speaker of English, I am truly grateful for this fact. So, indeeed, the Wacca API is in English, though the data we'll receive will be in UTF-8 encoded Japanese.

First of all, of course, I have to go find the Python documentation for the xmlrpclib module and learn how to use it. This is not as well-written as I would like, which is responsible for my false starts. But let me demonstrate how to use it to retrieve Wacca's "genre list". Wacca's API uses "genre" and "sub_genre" to identify songs in its database, and these are just integers. The genre list will tell me what the numbers actually mean.

>>> import xmlrpclib
>>> wacca = xmlrpclib.ServerProxy('http://a.wacca.fm/api/xmlrpc')
>>>
>>> genrelist = wacca.song.getGenreList()
>>>
>>> genrelist_text = "LIST OF GENRES DEFINED at wacca.fm (genre, sub_genre, genre_name)\n"
>>> for genre in genrelist:
...     genrelist_text += "%d.%d=%s\n" % (genre['genre_id'], genre['sub_genre_id'], genre['genre_name'])
...
>>> open("wacca_genrelist.txt", "wb").write(genrelist_text.encode("utf-8"))
>>>                                        

The first two lines import the xmlrpclib module and establish a proxy object for the remote server. This does not actually make any outside contact. The next line does that, by calling the "song.getGenreList()" method on "wacca", I'm really making a database query to the server. So this line takes a few seconds to run.

The result, genrelist, is a list of python dictionary objects with the appropriate mappings.

After that, I simply collate the data to create a readable text list and dump that out to a file. Of course, I could just print out the list to the terminal, but I want to save this list to a file so I can refer to it later.

I won't reproduce the whole file that results here, but here's the part I was interested in:

LIST OF GENRES DEFINED at wacca.fm (genre, sub_genre, genre_name)
1.0=ポップス
1.1=ポップス
1.2=フォーク
1.3=歌謡曲
1.4=演歌
1.5=童謡
2.0=ロック
2.1=ロック
2.2=ハードロック

...

Now, of course, the names are in Japanese, but I can read some of these. In English, these are "pop" (or "poppsu", since it really means something a little different in Japanese), "folk", "kayoukyoku" (another older style of Japanese pop music distinct from enka mainly in singing style), "enka", one I don't recognize, and of course, "rock" and "hard rock". It's primarily the "1" genres that I'm interested in searching (many of the other genres are Western genres that are better represented on Jamendo anyway -- not that I didn't look, but I won't go into that here).

The search interface is more interesting, and requires me to pass XMLRPC parameters

The search interface is more interesting, and requires me to pass XMLRPC parameters. I found the documentation very confusing on this point, but in fact, what you are supposed to do is to pack the parameters into a Python dictionary and use that as the argument to the remote method. The Wacca API for the song.Search method provides the definitions for the parameters I'm using here.

For the record, you can sort Creative Commons licenses completely with the interface, although it may not seem entirely intuitive. All have the value of copyright set to "creativecommons" (many Wacca tracks do not use Creative Commons licenses at all). Here's the breakdown:

License CC-By CC-By-SA CC-By-ND CC By-NC-SA CC By-NC-ND
copyright_commercial yes yes yes no no
copyright_modifications yes share no share no

Sorting out the Creative Commons licenses in the Wacca search interface

Now it's possible to answer my question about whether By-SA Enka exists:

>>> bysa_enka = wacca.song.Search({'genre':1, 'sub_genre':4, 'copyright':'creativecommons', 'copyright_commercial':'yes', 'copyright_modifications':'share'})

Unfortunately, what you get is a traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/xmlrpclib.py", line 1147, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.5/xmlrpclib.py", line 1437, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.5/xmlrpclib.py", line 1201, in request
    return self._parse_response(h.getfile(), sock)
  File "/usr/lib/python2.5/xmlrpclib.py", line 1340, in _parse_response
    return u.close()
  File "/usr/lib/python2.5/xmlrpclib.py", line 787, in close
    raise Fault(**self._stack[0])
xmlrpclib.Fault: <Fault -2: 'NO_RESULT'>

So it seems that the answer is "no". A similar search for By-licensed Enka...

>>> by_enka = wacca.song.Search({'genre':1, 'sub_genre':4, 'copyright':'creativecommons', 'copyright_commercial':'yes', 'copyright_modifications':'yes'})

gives the same results. So it seems that the wonderful world of Japanese enka has not yet been touched by the concept of true free licensing. There are quite a few under the By-NC license, though:

>>> bync_enka = wacca.song.Search({'genre':1, 'sub_genre':4, 'copyright':'creativecommons', 'copyright_commercial':'no', 'copyright_modifications':'yes', 'limit':1000})
>>> len(bync_enka)
103

So should I decide it's worth the trouble to try to negotiate for a relicense of a track in a foreign language, I can at least find them.

I can look at some details on one of these tracks, like so:

>>> print "(%s, %s) -> %s" % (bync_enka[17]['artist_name'], bync_enka[17]['song_title'], bync_enka[17]['url'])
(京鈴, 瀬戸内無情) -> http://wacca.fm/m/19278

And I can simply paste the URL for the track into my browser location bar if I want to take a closer look.

There are some other types of music that are available under free licenses, so I might like to give those a listen:

>>> by_kayoukyoku = wacca.song.Search({'genre':1, 'sub_genre':3, 'copyright':'creativecommons', 'copyright_commercial':'yes', 'copyright_modifications':'yes'})
>>> len(by_kayoukyoku)
1
>>> 
>>> print "(%s, %s) -> %s" % (by_kayoukyoku[0]['artist_name'], by_kayoukyoku[0]['song_title'], by_kayoukyoku[0]['url'])
(奥様レコード, なめこ汁) -> http://wacca.fm/m/9181

Ha. There's one kayoukyoku track. It's here.

The one kayoukyoku track under a free license. It has toilet paper on the cover. I don't know why, eitherThe one kayoukyoku track under a free license. It has toilet paper on the cover. I don't know why, either

Ah Well, Just Exploring

Well, it appears my quest for free-licensed Enka is unfulfilled, but there is plenty of other Japanese pop on the site. Searching under "poppsu", for example turns up 47 By-licensed tracks and 29 By-SA-licensed tracks. This is much better than my results with Jamendo. In total there are 168 "By-SA" tracks of all genres and 205 "By" tracks of all genres on Wacca (on the day I did this search, of course -- the results will naturally change over time).

It is of course, a very tiny drop in the bucket amongst all the other tracks on Wacca, which is the reason this API search was necessary to find out what I wanted to know. But it's still interesting stuff. I'm currently working on a script to collect this information in a more accessible way. Fun stuff!

Classifying Licenses from Wacca Results

By way of postscript, I do want to share one more handy function I wrote for working with this data. The data returned by the search calls is a list of dictionaries containing data about each song. But it preserves the same odd way of expressing Creative Commons licenses. So I wrote this function to classify the license in the way I'm used to:

def classify_cc(track):
    """
    Convenience function to identify CC licenses
    from the metadata that wacca.fm provides.
    """
    if not track['copyright']=='creativecommons':
        s = "ARR"
        return s
    else:
        s = "CC "
    if track['copyright_commercial']=='no':
        s += "By-NC"
    else:
        s += "By"
    if track['copyright_modifications']=='no':
        s += "-ND"
    if track['copyright_modifications']=='share':
        s += "-SA"
    return s

The expected argument is the dictionary contained in each element of the return list. So you can call this function to sort a set of results, or to identify the licenses using common abbreviations.

Licensing Notice

This work may be distributed under the terms of the Creative Commons Attribution-ShareAlike License, version 3.0, with attribution to "Terry Hancock, first published in Free Software Magazine". Illustrations and modifications to illustrations are under the same license and attribution, except as noted in their captions (most of the images in this article are fair-use screen captures). I waive all rights to the code examples, so they may be treated as Public Domain, see CC0.

Category: 

Author information

Terry Hancock's picture

Biography

Terry Hancock is co-owner and technical officer of Anansi Spaceworks. Currently he is working on a free-culture animated series project about space development, called Lunatics as well helping out with the Morevna Project.

Most forwarded

Interview with Dave Mohyla, of DTIDATA

Dave Mohyla is the president and founder of dtidata.com, a hard drive recovery facility based in Tampa, Florida.

TM: Where are you based? What does your company do?
DTI Data recovery is based in South Pasadena, Florida which is a suburb of Tampa. We have been here for over 10 years. We operate a bio-metrically secured class 100 clean room where we perform hard drive recovery on all types of hard disks, from laptop hard drives to multi drive RAID systems.

Anybody up to writing good directory software?

Since the very beginning, directories (of any kind) have had a very central role in the internet. (I have recently grown fond of Free Web Directory. Even Slashdot can be considered a directory: a collection of great news and invaluable user-generated comments. As far as software is concerned, doing a quick search on Google about software directories will return the free (as in freedom) software directories like Savannah, SourceForge, Freshmeat and so on, followed by shareware and freeware sites such as FileBuzz, PCWin Download Center and All Freeware (great if you're looking for shareware and freeware, but definitely less comprehensive than their free-as-in-freedom counterparts).

Interview with Mark Shuttleworth

Mark Shuttleworth is the founder of Thawte, the first Certification Authority to sell public SSL certificates. After selling Thawte to Verisign, Mark moved on to training as an astronaut in Russia and visiting space. Once he got back he founded Ubuntu, the leading GNU/Linux distribution. He agreed on releasing a quick interview to Free Software Magazine.

Is better education the key to finding better software?

I read David Jonathon's article Anybody Up To Writing Good Directory Software? the other day, which got me thinking about software directories in general. As David mentioned, many of the software directories one finds when doing a quick google search are free as in beer, not as in freedom. But what interests me is the software directories that already exist, providing a combination of both free as in beer software, and open source software. Sites such as Freeware Downloads and Shareware Download don't advertise themselves as providing free as in liberty software, but each of them have a good selection of open source software available... if you know where to look.

Most emailed

Free Open Document label templates

If you’ve ever spent hours at work doing mailings, cursed your printer for printing outside the lines on your labels, or moaned “There has got to be a better way to do this,” here’s the solution you’ve been looking for. Working smarter, not harder! Worldlabel.com, a manufacture of labels offers Open Office / Libre Office labels templates for downloading in ODF format which will save you time, effort, and (if you want) make really cool-looking labels

Creating a user-centric site in Drupal

A little while ago, while talking in the #drupal mailing list, I showed my latest creation to one of the core developers there. His reaction was "Wow, I am always surprised what people use Drupal for". His surprise is somehow justified: I did create a site for a bunch of entertainers in Perth, a company set to use Drupal to take over the world with Entertainers.Biz.

Update: since writing this article, I have updated the system so that the whole booking process happens online. I will update the article accordingly!

So, why, why do people and companies develop free software?

More and more people are discovering free software. Many people only do so after weeks, or even months, of using it. I wonder, for example, how many Firefox users actually know how free Firefox really is—many of them realise that you can get it for free, but find it hard to believe that anybody can modify it and even redistribute it legally.

When the discovery is made, the first instinct is to ask: why do they do it? Programming is hard work. Even though most (if not all) programmers are driven by their higher-than-normal IQs and their amazing passion for solving problems, it’s still hard to understand why so many of them would donate so much of their time to creating something that they can’t really show off to anybody but their colleagues or geek friends.

Sure, anybody can buy laptops, and just program. No need to get a full-on lab or spend thousands of dollars in equipment. But... is that the full story?

Fun articles

Santa Claus - the most successful open source project

It dawned on me the other day, as I was shopping for the dozens of gifts it seems I have to buy every December, that Santa Claus is the most successful open source project in history. (Bridget @ Illiterarty would agree with that). Santa Claus is essentially a marketing development that is embodied by everyone who stuffs a sock, gives a gift, hosts a dinner or wishes Merry Christmas over the holiday season.

Most emailed

Editorial

When I first started thinking about Free Software Magazine, I was feeling enthusiastic about the dream. I had Dave, Gianluca, and Alan willing to help me, I had established members of the free software community willing to help me out, I had writers volunteering their time and energy for free, and I had a generous offer from OpenHosting for servers, all before I'd proved myself. There was a sense of excitement in the air, and I thought maybe, just maybe, I could make this work.

Free Software Magazine uses Apollo project management software and CRM for its everyday activities!