LAMP (PHP) Based Word/HTML to ePub Converter
I'm basically looking to create a web-based application similar to SmashWords' meatgrinder but which works on a Linux system on documents saved in MS Word 2010 and later, as "Web Page, Filtered" -- I do not need to parse actual .docx files. This system can be single user, but the ability to expand to multi-user functionality would be a definite plus, as I am considering launching it commercially (and am therefore in the market for a possible profit-sharing partner, but will consider a contractor).
The converter should take the HTML which MS Word produces and:
1) Separate the HTML from the CSS, creating an external CSS file with *only* relevant CSS
2) Split the HTML up into separate files of no greater than 250kb each
3) Convert all convertable character entities
4) Eliminate empty tags--EXCEPT paragraph tags, which should be left in place.
5) Strip out redundant code (font data, etc.)
6) Recognize page breaks and section breaks
7) Recognize the table of contents and use it to produce a toc.ncx file
8) Recognize bookmarks and move them from within a paragraph or header to before the paragraph/header
9) Recognize bulleted lists and convert them to proper UL lists
10) Recognize numbered lists (with special style names, if necessary) and convert them to proper OL lists
11) Allow online editing of the final HTML for tweaking (a visual editor is a plus, but code-only is OK ... in a single-user version, editing via the shell is OK, but in a commercial version online editing must be possible)
12) Allow upload of a cover image
13) Allow upload of all included images
14) Allow CSS to be modified
15) Produce an ePub file which passes ePubCheck (ePub 2.0 is fine for now)
16) Uses the ePub file to produce a .mobi file with KindleGen that does not produce errors
17) Produces a second ePub file for Apple's iBookstore -- this file's HTML may need to be edited a final time to meet their standards
I have dozens of Word-formatted books available for testing purposes. All have been formatted to exacting standards using built-in and custom Word styles and have been run through a similar program successfully (for which the code is available--it's just not as robust as I need it to be).
Extensive comments in the code are critical, as future modifications may be required, and if the original coder is no longer available (too busy, hit by a bus, etc.), anyone with the necessary skills should be able to make the modifications.
Please send me a PM describing your experience and tell me how you would go about this project. Include whether you are interested in a profit-sharing partnership or, if not, what your bid would be for completing this project, and your payment terms. Please include any other information you feel would be helpful to me.