Automating book covers w/ XML

46
Simon & Schuster Steve Kotrch Director of Publishing Technology [email protected] Twitter: steveko 1

description

How, through the use of XML and databases, we simplified and automated the process of creating book covers for S&S.

Transcript of Automating book covers w/ XML

Page 1: Automating book covers w/ XML

Simon & Schuster

Steve KotrchDirector of Publishing Technology

[email protected]

Twitter: steveko

1

Page 2: Automating book covers w/ XML

Automating Book Covers with

MarkLogic

2

Page 3: Automating book covers w/ XML

• Simon & Schuster, a Trade publisher

• The problem with the cover workflow

• The solution, “Cover Automation.” We’re going to explore how MarkLogic Server together with InDesign’s XML capabilities help us make this happen.

• Where it works, where it doesn’t. Where it might work in the future.

What we’ll cover

3

Point 3 is of course the meat of this presentation.

Page 4: Automating book covers w/ XML

Simon & SchusterTrade Book Publisher

4

4

Fiction, non-fiction, audio books, Childrenʼs books, ebooksRoughly 2000 titles/yearIn business since 1924

Page 5: Automating book covers w/ XML

• Finance: venture capital for authors

• Content creation

• Prepare content for publication

• Manufacturing

• Distribution

• Marketing

• Sales

5

Trade Publishing

5

What it means to be a trade book publisher. We pay authors to write books--venture capitalist. We donʼt create our own content; thatʼs for the authors to do. We do prepare content for publication: copyedit, and design, create marketing material--and a book cover these days is marketing, like packaging for any product. We arrange for manufacturing. We distribute, market and sell books.

Page 6: Automating book covers w/ XML

• .NET development environment

• SQL Server on the back end

• MarkLogic for content management

Technology

6

In terms of technology, the infrastructure is mostly Microsoft, with a .NET development environment. We use SQL Server for our databases, including business-critical systems. And of course Mark Logic for content management.So Why MarkLogic?

Page 7: Automating book covers w/ XML

Digital Warehouse• Scanning of 15,000 books

• Printable, searchable PDF; XML; OEBPS

• Rights database

• Scan contracts

• Enter rights information into a database

• Data Distribution

• XML, binaries

7

Six years ago or more: Future-proof content. >15,000 titles. POD PDF (search), OEBPS, XML.80 years: >15,000 titles. Chosen for marketability & rights. => scan author contracts & create rights database.ML Server to help w/ distro of content & metadata--in XML, multiple destinations, unique req’sFirst implementation: store scanned OCR’d contracts

Page 8: Automating book covers w/ XML

Digital Warehouse

TMM / ProductDatabase

DATA

Chuckwalla

DAM

MarkLogicServer

XML

DistributionPlatforms

SearchEngines

Retailers

THE INTERNET

8

Now our Syndication Server, as we call it, is part of a 3-pronged approach: structured data (SQL Server); binary data (DAM); XML data (MarkLogic Server). Now book XML is used not just for future-proofing, but also for search and for extracting sample chapters.

Page 9: Automating book covers w/ XML

Key to Online Marketing:Search

9

Why search is important, and not just for books. 19% of all online retail is done through Amazon.com.Completeness and quality of information gets top hits. This is a 10-year-old book, still the top hit for this topic because of completeness, quality of info

Page 10: Automating book covers w/ XML

Current ML “Servers”

• Contracts (Executed: Scanned, OCR’d)

• Syndication

• Cover Automation

• Content Enrichment (soon)

10

I talked about three uses to which we’ve put Mark Logic at S&S, and later I’ll discuss a fourth one which will be coming on line soon, along with other plans we have for the technology

Page 11: Automating book covers w/ XML

11

Cover AutomationCover AutomationCover Automation

11

Let’s turn back to the main topic of this talk. Mentioned before: Not STM, not Educational. On the XML FIRST value graph, down near zero. But here I found a compelling case for creating an XML-based publishing system.

Page 12: Automating book covers w/ XML

12

12

Here is the jacket for a pretty successful hardcover book, Sing You Home by Jodi Picoult. Letʼs look at the problems involved in developing book covers like this one. File is headed for manufacturing.Book covers in todayʼs marketplace: Marketing—packaging, the representation to the world of that product. (A couple of slides ago, images representing the products that S&S publishes. You probably didnʼt even give it a thought that these were pictures of book covers. “Placeholder image” =>10% bump in pre-publication sales.Just the right look, and just the right words. One impetus to developing the system: print = online. Another revolves around two issues . . . <click>

Page 13: Automating book covers w/ XML

Two IssuesMultiple formats Multiple inputs

13

Page 14: Automating book covers w/ XML

14

14

One problem facing our designers is that for a publisher the size of Simon & Schuster, the books come in a lot of different sizes, various formats, and from various imprints. Imprints are profit units each of which has a slightly different outlook on the world, and likely to be run by someone with a healthy ego and strong opinions, resulting in each having a slightly different layout. All these different layouts and sizes poses a problem for designers and for the workflow--getting clear, definitive and accurate information to the designers about what theyʼre supposed to create.

Page 15: Automating book covers w/ XML

Two IssuesMultiple formats Multiple inputs

15

Page 16: Automating book covers w/ XML

16

EMAILS

WORD FILES ON A FILE SERVER

PRINTED OR WRITTEN NOTES

16

Added to this was the fact that information about what is supposed to go on the cover was coming from multiple and sometimes contradictory inputs—emails, word files that were dropped in one or another folder on a file server, or printed or even written notes. Also, these werenʼt coming to the designer in any particular order or even following a schedule. Editors with clout left off entire sections of copy until the last minute.Youʼre going to hear me use the term “copy” here, which comes from the advertising and publishing industries. It means “text.”

Page 17: Automating book covers w/ XML

• Definitive, authoritative information about cover format and size

• A stable, controlled set of cover templates

• A single source for cover copy

• Digital delivery of copy to Design

The Solution

17

<click>The thing is, we have all the pertinent format information about book covers in our production database. I figured we should use it. <click>To help avoid confusion, I also got buy-in from the art directors for them to develop a definitive, concise library of templates and, though it might not seem important, a vocabulary of terms. <click>We also have a system for editors to write descriptive copy about books for internal use, the tipsheet system. So theyʼre used to writing copy in a web-based system. Writing cover copy in a related system isnʼt that much of a stretch.

Page 18: Automating book covers w/ XML

Cover Workflow

AcquisitionEditor

Tipsheet

Editor orCopywriter

Cover Copy Finished Copy

Designers Designers

Finished Layout

Review:

Specifications

Marketing

White Layout

18

What we established with the aid of Cover Automation is a clear workflow for book covers, from the creation of cover copy through to the design of the cover itself. This is a somewhat simplified view. 1. Acquisition Editor writes his or her Tipsheet. 2. Once Editorial and Marketing agree on a format for the book and some initial specifications (format, size, approximate length) this information is shared with Production. 3a. The Editor or sometimes a Copywriter writes the cover copy, usually based on what was written for the Tipsheet. 3b. Production finalizes the specifications. 4. Cover copy review and revision process, digitally. 5. Cover Automation system combines the finished copy, specʼs, metadata to create White Layout

Page 19: Automating book covers w/ XML

TITLE

AUTHORQUOTES

READING LINE

READING LINECREDITSISBN & PRICES

PRICES

AUTHOR BIO

AUTHORPHOTO

“ALSO” LINE

DESCRIPTION

19

QUOTE

SPINE FRONT COVER FRONT FLAPBACK FLAP BACK COVER

SUBTITLE

The Cover and its Features

19

We have a vocabulary for describing the elements of a cover, and itʼs got sections that we have names for, but you canʼt say that covers are really structured. The elements appear in different places, in a different order; not all covers have all elements. Still, we can use the information we have to our advantage, especially because we have a tool like Mark Logic to work with.

Page 20: Automating book covers w/ XML

20

The Cover Editor

20

Here is the Cover Editor. Generated by our product database, Title Management. The Cover Editor provides keys to solving the problem. <click> Important metadata: type (hardcover, so a dust jacket) and size of the cover. The Cover Editor is also the single source for cover copy: It allows Editorial to write and edit the copy for the various sections of a cover. <click> It allows them to place items in the order they want them to appear in (although here the authorʼs name isnʼt on the top of the front cover, where it should be). Of course it also allows them to add elements, such as praise quotes.

Page 21: Automating book covers w/ XML

21

21

Letʼs scroll down and take a closer look at the bookʼs description, and follow that through. The Cover Editor is a pretty full-featured program. It It allows for basic formatting--bold and italics--and keeps track of all text changes and records when they happened and who worked on it. I noticed that Sarah Branham added most of the elements to this cover, and that the copyeditor Carole Schwindeller worked on it as well.The right-hand column is for notes

Page 22: Automating book covers w/ XML

22

22

The system also provides routing of the cover copy through various departments. When the user <click> “sends” the cover copy to someone, the system creates an email that identifies the sender. They can write a message to the recipient, and this email contains a link to this Web page. Once all the approvals are received, Editorial sends the cover copy to the designer. <click> She clicks the Build Cover button. She then chooses <click> the imprint, format and size, and tells the system to build a cover for her. We debated creating the cover automatically because the system has all that information, but the art directors felt that there enough departures from the standards involved that they were more comfortable having the designers go through this step and choose manually.

Page 23: Automating book covers w/ XML

23

The White Layout

23

The result that is downloaded to the designerʼs desk is what I call the White Layout. <click> Here you can see the Description that we have been following. Letʼs take a look at what is happening behind the scenes.

Page 24: Automating book covers w/ XML

24

24

Behind the layout--and what the designer doesnʼt see--is an XML structure. Hereʼs the Description again <click>, with its corresponding element in the structure. So how do we get to this point?

Page 25: Automating book covers w/ XML

25

Cover Editor/SQL Server: Cover Copy in XML Template in IDML

“White Layout”

The “Replacement Engine”

25

We took a look at the Cover Editor, which contains the metadata and the cover copy. The SQL database which provides the Cover Editor generates an XML version of the data to go on the cover. Another key element is the template, which weʼll examine in a moment. The InDesign template corresponding to the format and size of the book is exported to a variety of XML called IDML, which the MarkLogic server can read, and the server combines the two to create the White Layout. Weʼve all heard about Babbageʼs Difference Engine. What we have here is a Replacement Engine. Replacement is a key concept to understanding how this system works.Letʼs take a look at that process step-by-step.

Page 26: Automating book covers w/ XML

26

The Template (indd)

26

Here it is in Indesign format. Once the designer has the layout set, we identify the elements that are to be populated with cover copy, using InDesignʼs capability to introduce structure into their layouts. Notice that this one contains 14 structural elements. So in addition to the cover copy that Editorial writes, it allows us to populate items like prices and ISBN, and what you might call “boilerplate,” like the line that tells buyers that the book is also available as an ebook or an audio. This is a great timesaver, and eliminates the possibility of of a lot of errors, as it used to be up to the designers (notoriously bad typists) to type in the ISBN, prices, and tag lines like “Also available as and ebook.”I want to bring to your attention the element that is on the flap. It is a placeholder frame <click>. There is only one element here, yet, as weʼll see, there will be more than one element that is to be placed in the white layout here. We could never predict how many elements there might be in a section, and weʼll discuss how we solved that particular problem a little later on.

Page 27: Automating book covers w/ XML

27

The Template (idml)

27

Once the template is set, it is saved in idml format. IDML is an XML format, consisting of XML files describing each aspect and piece in the layout, then placed in a ZIP wrapper, similar to what Microsoft is doing with its Office documents. What is important about the fact that Adobe has done this, is that the IDML format is both complete and native. That is key for us, because it allows us to edit or even create InDesign documents through XML.Here is the template opened in an XML editor. Notice all the folders listed down the left side of the window.<click>

Page 28: Automating book covers w/ XML

28

The Template (idml)

28

Here Iʼve opened the “Stories” folder, and chosen one of the “Stories.” Iʼve highlighted the tag that identifies this Story as the “placeholder” frame for copy that is to be placed on the flap <click>. A “placeholder” frame is the first in a column of one or more frames in that Section.

Page 29: Automating book covers w/ XML

29

Cover-Info

29

The other key building block is the XML output by the SQL database. We refer to it as “cover-info.” Cover-info XML is not valid, but it is well-formed and Mark Logic is perfectly fine with it. It is a collection of the elements that are to appear on the cover, both those that were written by Editorial and data generated from the product database.<click> We can see the first section, frontcover, and the first three and part of a fourth element, identified as TextFrames: the TITLE, SUBTITLE, AUTHOR, and parts of the HEADLINE, like “#1 New York Times bestselling author” The words “New York Times” are in a SPAN identified as “emphasis.”

Page 30: Automating book covers w/ XML

30

30

If we scroll down through this file, we come to the Description, which we were following. Itʼs part of the Section called “flap.” There is a quote from Stephen King followed by the Description we were looking at before. So, as I pointed out before, here are two text frames that are to be positioned in the same section.Hereʼs how we solve that problem: Notice the InDesign-specific information. There is an argument in the section tag called y-offset. <click> This indicates that each TextFrame for that section is to start 54 points--or 3/4 inch--below the previous one. Also notice that the ParagraphStyle argument. It corresponds with InDesign paragraph styles, and the CharacterStyle corresponds with character styles set up in the template.Where Mark Logic comes into the equation, as you might have guessed, is putting these two XML streams--the IDML of the template and the XML of the cover-info--together to generate the White Layout.

Page 31: Automating book covers w/ XML

31

31

Letʼs look behind the scenes at what happens when the Designer pushes the “Generate Layout” and then “Start” buttons. In this administratorʼs view we can see the templates weʼve saved in IDML format. These correspond to the choices made by the designer in terms of division, format and size. Clicking on one of these listings we can see the ISBNs which were processed with that template.<click> Clicking on that displays the date and time that it was processed, along with a string that shows us what is happening: <click> At the bottom of the browser we see a string that calls a module called Controller, an XQuery, and passes to it two arguments, the ISBN and the template.

Page 32: Automating book covers w/ XML

32

32

Iʼm not going to go through the Xqueries line by line. Iʼm not an Xquery expert by any stretch, but that would take us the better part of a month to do.

Page 33: Automating book covers w/ XML

33

33

Suffice to say that Controller does what its name implies. It runs the show. The heavy lifting is done by another Xquery, lib-ss. Thatʼs where “run-layout” resides.

Page 34: Automating book covers w/ XML

34

34

Lib-ss is 14 screens long, but the operative phrase appears in this snippet: “replace all the replaceable content.” A few slides back we looked at the XML contained in one of the IDML templates. It has placeholder-type tags, and values for them. We also looked at the cover-info XML delivered by the SQL database. It has tags in it called placeholder-type, pageitem type, and pageitem-number.

Page 35: Automating book covers w/ XML

35

35

The cover-info is placed into a corresponding folder in one part of the MarkLogic system <click>. What lib-ss and its minions do is open the cover-info XML and for each XML element <click> it looks into the content folder <click> for the relevant JobTicket and finds the Story with the relevant placeholder-type <click>--the “flap” in our case. It then replaces the contents of that story with the contents from cover-info <click>. It stores the result in the staging folder <click>.If there is more than one pageitem for that placeholder-type--the Description in “flap” in our case--it generates another Story, identical in type and size, and drops it in, 3/4” lower on the layout. Once it is through processing the contents of cover-info, it then combines--in memory--the Stories that are in Staging with the ones in the template, replacing whatever stories there have the same ID.

Page 36: Automating book covers w/ XML

36

36

It ZIPs this collection of files into an IDML, which the user downloads. Here is the IDML, opened in an XML editor, with the first Story of the flap displayed. If you look closely enough, <click> youʼll see that the local formatting is preserved from the Cover Editor.

Page 37: Automating book covers w/ XML

37

37

Here is the Description that we were following. <click>It is tagged as item-content, and a description.

Page 38: Automating book covers w/ XML

38

38

So the IDML that is delivered to the user--what I call the “white layout”--contains a wealth of structural information. We donʼt turn on the structure view for the user, of course.

Page 39: Automating book covers w/ XML

39

39

The designer then proceeds to add graphics, photos, color, and to set the type to create the finished layout.

Page 40: Automating book covers w/ XML

40

40

The result still contains the structure of the template even at this stage. Here the book description is highlighted. The system is actually designed so that if this layout is saved as an IDML, it can be submitted to the system and the corresponding fields we saw in the Cover Editor could be updated with any changes made to the text in the InDesign file. The idea was that this finished cover would contain the final word on what editors, publishers and marketing types wanted to say about the book.However, we uncovered two flaws in our logic.

Page 41: Automating book covers w/ XML

41

CoverTextAsArt

41

For one thing, many covers, like this one, have artwork instead of text. For Children's, this meant that our approach of having cover copy delivered from a database wasn't of much use. There wasn't much text on their covers anyway. But designers in the Adult Division wanted to be able to turn some of the text into art, too. And they were apt to break up or combine frames of text, resulting in the IDML Stories getting out of synch.More importantly, waiting for the final, approved text on the cover has become too much of a luxury. While at the time we began this project, total, letter-perfect correspondence between the cover and online copy was mandatory. Online marketing demands that this information be timely. That means it must be out before the cover is even close to being ready for the printer. Weʼre getting to the point that what goes out o the Net does not have to batch the cover character for character.

Page 42: Automating book covers w/ XML

Future Directions“Catalog Automation”

42

We are thinking of other applications for this technology, though. Here is the page from our HTML-based digital catalog for Sing You Home, but weʼre exploring the idea of using a similar approach to building print catalogs, as well.

Page 43: Automating book covers w/ XML

Future DirectionsContent Enrichment

43

One of the reasons I pushed so hard for ML, besides XML manipulation, was search.Our Digital Group, with the help of MarkLogic Professional Services and some third-party tools, has built a content enrichment and search tool which will be going live shortly. Here I’ve typed in the phrase “civil rights.” It has turned up a list of relevant titles and extracted appropriate sections of text. The tool also allows the user to drill down into the title. At a demonstration of the tool, someone typed in Quaddafi and turned up a book from years ago, that everyone had forgotten about, written by a personal friend of his. I think this will become a valuable tool, and it shows why saving content in XML is important, even though we don’t have an XML-based workflow.

Page 44: Automating book covers w/ XML

• What it means to be a Trade publisher

• The problem: haphazard copy, unclear format information

• How we use MarkLogic Server to create InDesign layouts

• Limitations of this approach.

• Where it might work in the future.

Conclusion

44

So weʼve covered (read) <click> <click> <click> <click> <click>

Page 45: Automating book covers w/ XML

I would just like to acknowledge the contributions of Frank Rubino and Jason Myatt of MarkLogic. Not only did they build the MarkLogic part of Cover Automation, but help me with this presentation.

Acknowledgements

45

Frank Rubino & Jason Myatt

Page 46: Automating book covers w/ XML

Simon & Schuster

Thank You—Questions?Steve Kotrch

Director of Publishing Technology [email protected]

Twitter: steveko

46

46