Open source books @ Kindle
I finally bought my Kindle Paperwhite (which is jaibroken btw). The cool thing about Kindle that it has community that provides a lot of good stuff:
- Converter of Apple Docset documentation to Kindle
- Converter of source code trees
- Converter of RFCs documents
- Converter of Python documentation
- Port of Coolreader
- Framework for writing “recipes” to convert any book
I used last one, kindlefodder
, to write recipes for books The Little Book on CoffeeScript and Smooth CoffeeScript
Kindlefodder takes a lot of routine work to itself.
First, I specified document’s info, including cover:
def document
# download cover image
if !File.size?("cover.gif")
`curl -s 'http://akamaicovers.oreilly.com/images/0636920024309/lrg.jpg' > cover.jpg`
run_shell_command "convert cover.jpg -type Grayscale -resize '400x300>' cover.gif"
end
# book's info
{
'title' => 'The Little Book on CoffeeScript',
'author' => 'Alex MacCaw',
'cover' => 'cover.gif',
'masthead' => nil
}
end
Then, we are fetching first page (with TOC) and formatting YAML with articles list:
def get_source_files
# fetch first page (with TOC)
@start_url = "http://arcturo.github.com/library/coffeescript/"
@start_doc = Nokogiri::HTML run_shell_command("curl -s #{@start_url}")
# create sections.yml
sections = [{
title:"Main",
articles:extract_articles
}]
File.open("#{output_dir}/sections.yml", 'w') {|f|
f.puts sections.to_yaml
}
end
Resulted sections.yml
will look like:
---
- :title: Main
:articles:
- :title: Introduction
:path: articles/01_introduction.html
- :title: Syntax
:path: articles/02_syntax.html
- :title: Classes
:path: articles/03_classes.html
- :title: Idioms
:path: articles/04_idioms.html
- :title: Compiling
:path: articles/05_compiling.html
- :title: Applications
:path: articles/06_applications.html
- :title: The Bad Parts
:path: articles/07_the_bad_parts.html
So, it has one section (Main
) and a lot of articles. More complex books could have multiple sections, of course.
Let’s take a look on extract_articles
method:
def extract_articles
# iterating over Table of Contents and extracting articles
@start_doc.search('ol.pages li a').map do |o|
title = o.inner_text
FileUtils::mkdir_p "#{output_dir}/articles"
{
title: title,
path: save_article_and_return_path(o[:href])
}
end
end
And save_article_and_return_path
method, which fetches actual article, cleans it, saves it and returns saved article path:
def save_article_and_return_path href, filename=nil
path = filename || "articles/" + href.sub(/^\//, '').sub(/\/$/, '').gsub('/', '.')
# fetching article
full_url = @start_url + href.sub(/^\//, '')
html = run_shell_command "curl -s #{full_url}"
# cleaning article
article_doc = Nokogiri::HTML html
b = article_doc.at(".back")
b.remove if b
# saving article
res = article_doc.at('#content').inner_html
File.open("#{output_dir}/#{path}", 'w') {|f| f.puts res}
path
end
I’m using beautiful Nokogiri ruby library for chopping of HTML here.
You don’t have to care about images, .mobi
format and stuff like that, Kindlefodder does it for you.
So, resulted recipes: