Developed By:

Developed By:

Thursday, July 10, 2008

OpenOffice.org extension will add PDF editing

Easy PDF editing is coming to OpenOffice.org, but you'll have to be patient for a few months. Recently posted to the OpenOffice.org Extensions site, the Sun PDF Import extension (SPI) is only in beta, and only works with recent developer builds of OpenOffice.org 3.0, which is scheduled for September release. Right now, the quality of the final release is anybody's guess, but the beta's capabilities fall squarely in the middle of the available PDF import tools.

To investigate SPI, you need to download and install a snapshot build of OpenOffice.org 3.0. Then, from within the build, you can install SPI from Tools -> Extension Manager. The next time you start OpenOffice.org 3.0, you'll be able to open PDF files from any of the options for opening an existing document in the File menu.

By default, SPI opens PDF files in the Draw application, although you could also use Impress, which shares much of the same code. This default might seem strange at first, especially if your PDF file is text. Actually, though, using Draw is logical, given the limitations of the PDF format. No application -- not even Acrobat, the proprietary PDF editor made by Adobe, the company that wrote the PDF specifications -- is able to edit more than a single line while preserving format. Given this limitation, importing to Draw makes sense, because it can treat each line as a separate text object for editing. Although rearrangement of a paragraph requires line by line editing with SPI, and can be tricky if you need to add an extra line, the extension leaves you no worse off than any other PDF editor.

But at least you are in a relatively friendly user interface. Aside from the limitation of editing one line at a time, the worst problem you are likely to have is the automatic capitalization of the first character of each line if you have Autocorrect turned on while you edit.

In testing, SPI's success at importing text depended largely on the fonts being used in the document. For best results, you need to have the fonts in the imported PDF file installed on your system; otherwise, SPI will use a substitute font that may not correspond to the letter spacing of the original. Also, while common fonts such as Helvetica or Times Roman create few problems, SPI seems to have trouble reading the metrics of some PostScript fonts and displaying them correctly. Usually, the display problem takes the form of a line of text that, converted to a text object, extends far beyond the page margins, and makes reformatting tedious if not impossible. At times, too, the problem leaves random spaces scattered throughout all the lines.

Graphics in imported PDF files had similar mixed success during testing. Many import into Draw without any trouble, with text wrapping around them in the same style as the original document. However, some PNG images -- but not all -- were imported vertically inverted, and, in another case, a graphic became an uneditable object. And, in some complex layouts, the positioning of some graphics was off by perhaps a dozen pixels.

In the beta, SPI cannot handle PDF forms, and text alignment is not always preserved, with fully justified text showing a strong tendency to import as left-aligned. Nor are hyperlinks supported, although they are a basic necessity for many online documents.

Otherwise, the list of what SPI can handle is much larger than the list of problems: Text frames, sections, multilevel lists, and table formatting including border and background color, are all imported without any problems, all of which makes for a promising start for the extension.

For now, though, the problems with rendering fonts and graphics mean that SPI, like OpenOffice.org 3.0, is not ready for production use.

Still, in its current state, SPI is ahead of Abiword, which simply extracts the text from a PDF file and not the graphics, and KWord, which preserves line division but not most other text formatting.

But SPI's current state is behind that of Inkscape, whose main limitations are a restriction to single-page imports and the failure to preserve hyperlinks. Nor is SPI as reliable as PDFedit, which, despite being aimed at advanced users, remains the most reliable PDF editor for the GNU/Linux desktop. Still, a lot of development can happen in the next few months, and if SPI continues as it has started, its final release just might become an essential OpenOffice.org extension.