环境:centos 5.5 + ruby 1.8.7 + pdfkit 0.4.6
最近给我的blog生成了pdf文档,每半年生成一个pdf文档,尝试不少开源组件,发现好用的不多,其中比较好的两个事prawn 和 pdfkit ,
最后实验下来pdfkit 很好用,可以把html + css 转化为pdf文档,底层使用了wkhtmltopdf , 而且wkhtmltopdf可以为shell直接调用,非常之方便
1,安装pdfkit
>gem install pdfkit >sudo pdfkit --install-wkhtmltopdf
报 lzcat 找不到 , 安装之
>yum update >yum install lzma
2,生成pdf文档的脚本
#注意 http://blog.wxianfeng.com 必须存在 a 链接 , 因为 wkhtmltopdf 可以直接对 url 抓取生成 pdf require 'rubygems' require 'pdfkit/source' # require "pdfkit" 报错,提示找不到PDFKit require 'pdfkit/pdfkit' require 'pdfkit/middleware' require 'pdfkit/configuration' PDFKit.configure do |config| config.wkhtmltopdf = '/usr/local/bin/wkhtmltopdf' end range_t = [ ["2009-06-30","2009-12-12 23:59:59"], ["2009-12-12","2010-06-31 23:59:59"], ["2010-06-31","2010-12-12 23:59:59"] ] path = "/usr/local/system/src/blog.wxianfeng.com_pdf/" exist_files = Dir.open(path).to_a.select{|x| x != '.' && x!= '..' && x != '.svn' && x != 'Thumbs.db'} range_t.each do |ele| next if exist_files.include?("blog.wxianfeng.com_#{ele.first}~#{ele.last.slice(/\d+-\d+-\d+/)}.pdf") posts = Content.all(:conditions=>["published_at BETWEEN ? AND ?",ele.first,ele.last]) kit , html = nil , '' posts.each do |i| p i.published_at.to_s(:db) + " " +i.title html << "<strong>" + i.title + "</strong><br/><br/>" + i.html(:all).gsub(/[\s\n<br\/>]([a-zA-z]+:\/\/[^\s<>"]*)/,'<a href="\1">\1</a>') + "<br/><br/><br/><br/>" end kit = PDFKit.new(html) # kit.stylesheets << "#{RAILS_ROOT}/themes/lindholmen/stylesheets/main.css" # kit.stylesheets << "#{RAILS_ROOT}/themes/lindholmen/stylesheets/print.css" # kit.stylesheets << "#{RAILS_ROOT}/themes/lindholmen/stylesheets/local.css" kit.to_pdf kit.to_file "/usr/local/system/src/blog.wxianfeng.com_pdf/blog.wxianfeng.com_#{ele.first}~#{ele.last.slice(/\d+-\d+-\d+/)}.pdf" end
3,运行
ruby script/runner script/tools/generate_pdf.rb
注意script/runner 调用的事development指定的db