One strictly for the ONIX geeks

posted by Rob on December 11, 2010 11:51 AM

I'm not nearly as familiar as Emma is with the ONIX standard and it's time for that to change. But as I'm staring at it now, I'm wondering why they did things the way they did. I hesitate to describe it as 'wrong' because maybe I'm not as au fait with this stuff as I think I am, but I'm certainly finding bits of it 'curious'. Take an excerpt like this:

Snippet:

...
<Measure>
   <MeasureType>01</MeasureType>
   <Measurement>194</Measurement>
   <MeasureUnitCode>mm</MeasureUnitCode>
</Measure>
<Measure>
   <MeasureType>02</MeasureType>
   <Measurement>130</Measurement>
   <MeasureUnitCode>mm</MeasureUnitCode>
</Measure>
<Measure>
   <MeasureType>03</MeasureType>
   <Measurement>18</Measurement>
   <MeasureUnitCode>mm</MeasureUnitCode>
</Measure>
...

In order to find the Measure element for the dimension you're interested in, you have to look inside each Measure and know that it's the code in the MeasureType child element you want to check. Why wouldn't you write it like this instead:

...
<Measure MeasureType="01" >
   <Measurement>194</Measurement>
   <MeasureUnitCode>mm</MeasureUnitCode>
</Measure>
<Measure MeasureType="02" >
   <Measurement>130</Measurement>
   <MeasureUnitCode>mm</MeasureUnitCode>
</Measure>
<Measure MeasureType="03" >
   <Measurement>18</Measurement>
   <MeasureUnitCode>mm</MeasureUnitCode>
</Measure>
...

It seems easier to use and to make more sense to use an attribute to label each Measure element and to leave the child nodes for data not labels. Is there some reason to do it the way they have?

And if you want to go the whole hog, I don't think much of their naming and code conventions either. If 'mm' can be the 'code' for millimetres, why does '01' need to be the code for height and not 'h'. Plus, the measurements are jumbled in with other information in DescriptiveDetail. I think I'd prefer this:

...
<PhysicalDimensions>
     <Measurement Dimension="h" Unit="mm" >194</Measurement>
     <Measurement Dimension="w" Unit="mm" >130</Measurement>
     <Measurement Dimension="d" Unit="mm" >18</Measurement>
 </PhysicalDimensions>
...

Seems to me that's more concise, easier to understand, easier to remember and easier to process if you're writing code to manipulate that data (e.g. selecting it in XPath).

It might be unfair to pick on the ONIX for Books standard like this because I've only looked at little pieces of it. And like I say, maybe I'm not as knowledgeable about XML as I think I am. But so far it looks like big chunks of it were decided very broadly in a committee and then the actual implementation was handled by someone who wasn't all that experienced. (But if it turns out there are wise and good reasons for things being the way they are please enlighten me and I'll stand corrected.)

spacer

Comments: 2


Rob

see this (kludged to show xml code) exact exchange between myself and our main goto guy for xml.

Stef Dawson:
"Disregarding the fact that whoever invented the spec didn't think
through how to parse it efficiently, it's the sort of thing the plugin
should be able to handle. Using TypeCodes to distinguish between data
types _inside_ a duplicated outer block at the same level is a pretty
retarded way to go about things from both a data processing and
programming logic standpoint (I think I can hear my old software tutor
gnashing his teeth from here!)

Far better would be something like:

>Text type="quote" format="2">
blah blah this book is great blah
>/Text>

>Measure type="height">
>Measurement>23>/Measurement>
>MeasureUnitCode>in>/MeasureUnitCode>
>/Measure>

(or even >Measurement type="height" unit="in">23>/Measurement>)

Differentiating on attributes is quicker, tidier and more efficient
for scripts that generate the XML feed and far easier for code to
parse than the abortion the standards body came up with! Guess they
had their reasons... (this is where you tell me you invented it and I
feel really small :-)

Heck, even their own readme admits:

"A number of software application frameworks, including Microsoft
.NET, cannot use the ONIX Release 2.1 Schema in its original form
released in November 2004, due to the inability of some XML Schema
Parsers to handle non-deterministic content models." "

Later my response...
>Darren and I had the same take on the ONIX spec as you did but Darren
> pointed out the numbering was probably due to internationality porting.

Stef's response:
Yeah, makes sense. Even so, if they'd used:

>Measurement type="02" unit="in">23>/Measurement>

it'd be preferable to all that markup fluff!

So you are not alone.

regards
Les Smith


Les,

Thanks for sharing that. It makes me feel like my complaints were probably valid. Having to look inside something (in this case an XML element) to find out what it is, rather than putting a label on the 'outside' just doesn't sound like a good plan. And no, I have no idea why the authors of the standard decided to do it that way, but my suspicion is that they had very little relevant experience and didn't know any better.

(And good point: 'h', 'w' and 'd' wouldn't be very international. But they could still be attributes not elements.)

spacer

Post a comment

We love hearing from our readers, but please stay relevant and pleasant. The comments are for responding to the specific blog post above. If you have any other queries, please contact Snowbooks via email. Off-topic or offensive comments will be removed without notice.

To screen out automated spam, please answer the following very easy question:

What colour is nice, new snow?

(please use all lower-case characters for your answer; no capitals)


Back to the blog »