Getting started¶
The philosophy of this package¶
The philosophy of this package is to try to create an isomorphic class structure based on IIIF presentation API, knowing the IIIF standard and Python and autocomplete ideally should be the only requirements to get started. Doing so you don’t need to learn IIIF Presentation API and pyIIIFpres data models because they should be isomorphic so learning one of the two should automatically reinforce the knowledge of the other.
Workaround for the common errors¶
One of the aims of this package is to prevent any possible error in the production and consumption of the manifest. The choice of this philosophy arises from years of finding poor metadata around the web.
At the same time, pyIIIFpres should offer always an easy workaround in case the threat is actually under control.
These global variables are lists or strings that you can modify to allow your specific case they are built from possible threads and recommendations from RFC and IANA and IIIF but in some cases, they can prevent correct choices:
from IIIFpres import iiifpapi3
iiifpapi3.INVALID_URI_CHARACTERS
iiifpapi3.LANGUAGES
iiifpapi3.BEHAVIOURS
iiifpapi3.MEDIATYPES
See the documentation on how to modify them.
There are more drastic approaches to modifying these variables like
setting the value of a property without the set_ method for instance
homepage.set_format("myinvalid/format") can be set directly like
this homepage.format = "myinvalid/format".
Eventually, all the checks can be disabled running your script using the optimization flag:
$ python -O myscirpt.py
Creating a IIIF type and populating it¶
The best method to get started is to have a look at Cookbook and find the relative example in the example folder of pyIIIFpres project. It is easy to modify the provided example and design interactively a manifest that suits your need.
Change the base URL (in the newer version the final slash is required).
from IIIFpres import iiifpapi3
iiifpapi3.BASE_URL = "https://iiif.io/api/cookbook/recipe/0009-book-1/"
manifest = iiifpapi3.Manifest()
When setting the ID use the extendbase_url for appending the
BASE_URL:
annotation.set_id(extendbase_url="canvas/page/annotation")
The ID will be:
https://iiif.io/api/cookbook/recipe/0009-book-1/canvas/page/annotation
The inspect() method can
give you a quick view of the Required and Suggested fields.
manifest.inspect()
Most of the add_ methods return a handler that can be used for
modifying the object.
cnv = manifest.add_canvastoitems()
cnv.inspect()
Using inspect() on the
handler will return only the hints for
populating it. Inspecting nested objects in the terminal could be
painful. .show_errors_in_browser()
opens a browser tab, showing the required and recommended fields.
Don’t forget that each class has also the method .__dict__. For
instance manifest.__dict__ returns a compact representation of the
manifest showing only the types of the items and the ID of the items.
manifest.__dict__
Save a IIIF object¶
Use .json_save()
on the root element to save the JSON file.
manifest.json_save()
Or .json_dumps() for dumping
it as string.
manifest.json_dumps()
Debugging the Code¶
It should be quite clear where the error happens during the insertion of
an invalid value through the iiifpapi module. However, missing
Required statements need some tools to spot them because they are
checked when the manifest/collection is written in the JSON file hence
the module does not show exactly when the error occurred.
The suggested approach is the following:
1. Open ipython console
2. run your script using %run myscript.py
3. activate the debug tool using %debug
4. press u and enter until you see the frame where
the manifest/collection is defined
e.g. mymanifest.json_save("SFMAG_manifest30.json")
5. now use .show_errors_in_browser() function on the istance of your
manifest/collection e.g. mymanifest.show_errors_in_browser()
6. A new tab in the browser should open (try to execute the command again if
it does not).
7. You can use your browser built-in search tool for searching the❌Required string this will bring you exactly where the error occurred
8. Now you can modify your script for correcting the error easily.
Getting the format right¶
pyIIIFpres
checks
the format MIME type based on IANA media types
page,
building the missing templates from the registry and the format for
instance image/jpeg,image/gif etc. etc.
The full list of media types used by pyIIIFpres can be accessed using:
from IIIFpres import iiifpapi3
iiifpapi3.MEDIATYPES
An additional helper class gives you the possibility to access the mediatypes using the dot notation:
from IIIFpres import MediaTypes
In [1]: MediaTypes.audio.mp4
Out[1]: 'audio/mp4'
In [2]: MediaTypes.image.jp2
Out[2]: 'image/jp2'
In [3]: MediaTypes.application.json_seq
Out[3]: 'application/json-seq'
Getting the language right¶
pyIIIFpres checks the language subtags of labels, summaries and other text content against the language subtag registry.
Warning
only language subtags are checked not variants or composite strings.
In this registry, there are more than 190 two-letters subtags and 8022 three-letters subtags, hence you have 28% chance that inserting a random two-letter string will result in a valid subtag and 45% chance that inserting a random three-letters string will result in a valid three-letter tag.
You might want to limit the check to a subset of languages you know are
in your document to avoid these errors. This can be achieved by
reassigning the LANGUAGES global variable:
from IIIFpres import iiifpapi3,BCP47lang
iiifpapi3.LANGUAGES = [BCP47lang.english,BCP47lang.spanish]
# all the rest of your script
Language map object¶
Remember that add_metadata
set_requiredStatement
if left empty return a languagemap
object that can help building multi language support.
reqst = manifest.set_requiredStatement()
reqst.add_label('Provided by','en')
reqst.add_value('Univeristy of Verona','en')
reqst.add_label('Contenuto fornito da','it')
reqst.add_value('Università di Verona','it')
Using a language detector¶
Another possible approach could be to use a language detector. There are many different alternatives to accomplish this task, this StackOverflow answer gives a good overview of the panorama.
This
example
shows a basic implementation using langdetect.
The output of using the language detector on the manifest iiifpapi3 object of 0065-opera-multiple-canvases recipe is the following:
In [3]: check_languages(manifest)
❌ L'Elisir D'Amore seems not to be: it
✅ The Elixir of Love is : en
✅ Date Issued is : en
⚠️ Could not detect language for 2019 but is set to: en
✅ Publisher is : en
✅ Indiana University Jacobs School of Music is : en
❌ Atto Primo seems not to be: en
❌ Atto Secondo seems not to be: en
✅ Gaetano Donizetti, L'Elisir D'Amore is : it
❌ Atto Primo seems not to be: en
✅ Preludio e Coro d'introduzione – Bel conforto al mietitore is : it
✅ Remainder of Atto Primo is : en
❌ Atto Secondo seems not to be: en
Behaviours and strange behaviours¶
To learn more about IIIF behaviours visit: https://iiif.io/api/presentation/3.0/#behavior
IIIFpres firstly checks that the behaviour is inside BEHAVIOURS(
global variable:
from IIIFpres import iiifpapi3
iiifpapi3.BEHAVIOURS
So if you want to use non-standard behaviour you have to append to this list first to bypass the check:
iiifpapi3.BEHAVIOURS.append("mystrangebehaviour")
Then checks if the behaviour is applied to the right IIIF object.
Warning
At the moment pyIIIFpres does not check if the objects referenced by Range
and Collection have the right attributes e.g. Canvas must have a duration
for auto-advance behaviour.
Getting the Image size automatically¶
pyIIIFpres do not provide built-in methods for inferring information about the images. Depending on your need you might find useful different solutions. This for keeping the footprint of the library low and avoiding requirements.
Format not supported |
|
|---|---|
Using ImageAPI |
None |
Pillow |
None. |
Exiftools |
None |
OpenCV |
jpg 2000 |
Matplotlib |
jpg 2000 |
Using ImageAPI¶
import requests
# when you use a proxy you might have to use the original link e.g. "http://localhost:1080/iipsrv/iipsrv.fcgi?iiif=/imageapi//m0171_0/m0171_0visn20_0001a21.jp2/info.json"
iiifimageurl = "http://lezioni.meneghetti.univr.it//imageapi/m0171_0/m0171_0visn20_0001a21.jp2/info.json"
imageinfo = requests.get(iiifimageurl)
jsoninfo = imageinfo.json()
imgwidth = jsoninfo['width']
imgheight = jsoninfo['height']
Exiftool¶
Once installed Exiftool it can be called as a subprocess:
from subprocess import check_output
out = check_output(["exiftool", imagepath])
Metadata = dict((e[:32].strip(),e[33:].strip()) for e in out.decode('utf8').split('\n'))
width = Metadata['Image Width']
height = Metadata['Image Height']
Using Pillow¶
pip install Pillow
from PIL import Image
image = PIL.Image.open("image.png")
width, height = image.size
Using OpenCV¶
pip install opencv-python
import cv2
img = cv2.imread('image.jpg')
width, height, channelsN = img.shape
Reading and modifying JSON compliant with API-3.0¶
pyIIIFpres offers some support for reading and modifying JSON files compliant with API 3.
Important
pyIIIFpres assumes that the JSON file is compliant with API 3,
inspect() and show_errors_in_browser() will highlight only
errors in the new addition of the user to the file.
read_API3_json() maps most of the
IIIF type to IIFpapi3 classes, you can access and modify the elements using
the normal set_ methods:
from IIIFpres.utilities import read_API3_json
mymanifest = read_API3_json('tests/integration/fixtures/0001-mvm-image.json')
mymanifest.items[0].items[0].items[0]
# Out: Annotation id:https://iiif.io/api/cookbook/recipe/0001-mvm-image/annotation/p0001-image
mymanifest.items[0].items[0].items[0].set_id('http:mynewid.com')
mymanifest.items[0].items[0].items[0]
# Out: Annotation id:http:mynewid.com
mymanifest.json_save("revised_manifest.json")
If performance is critical modify_API3_json()
map only the first IIIF type encountered to IIFpapi3 classes leaving all
the rest as dicts.
from IIIFpres.utilities import modify_API3_json
mymanifest = modify_API3_json('tests/integration/fixtures/0001-mvm-image.json')
canvas = mymanifest.add_canvas_to_items()
canvas.set_id(extendbase_url=["canvas","p1"])
canvas.add_label("en","Forgotten painting")
canvas.set_height(1271)
canvas.set_width(2000)
mymanifest.json_save("revised_manifest.json")
The deletion of an IIIF type instance can be done using
read_API3_json or modify_API3_json and removing the entity from
the structure manually:
mymanifest.items.pop(0)
or using the delete_object_byID()
if the instance has an ID:
from IIIFpres.utilities import delete_object_byID
mymanifest = modify_API3_json('tests/integration/fixtures/0001-mvm-image.json')
mymanifest.__dict__
delete_object_byID(mymanifest,id='https://iiif.io/api/cookbook/recipe/0001-mvm-image/page/p1/1')
mymanifest.__dict__
Note
any nested object will be deleted when deleting the parent object!
Tip
If you need supports for reading annotations of both IIIF API 2.1 and 3.0 an excellent package to use could be : python-iiif-annotation-tool
Validating the JSON manifest using the official schema¶
pyIIIFpres does its best to prevent any error but it is wise to check if the resulting .json is valid using the official IIIF API 3.0 schema provided by Glen Robson and IIIF Consortium that can be found here.
A possible solution could take advantage of jsonschema python
package:
# json schema is availabe here https://github.com/IIIF/presentation-validator/blob/master/schema/iiif_3_0.json
# we can use jsonschema module for vallidating
# to install it you can use pip install jsonschema
import os
import jsonschema
import json
if not os.path.exists('iiif_3_0.json'):
import urllib.request
jsonchemadownloadurl = r"https://raw.githubusercontent.com/IIIF/presentation-validator/master/schema/iiif_3_0.json"
with urllib.request.urlopen(jsonchemadownloadurl) as response, open("iiif_3_0.json", 'wb') as out_file:
data = response.read()
out_file.write(data)
to_be_tested = "example_manifest_response.json"
with open(to_be_tested) as instance, open("iiif_3_0.json") as schema:
jsonschema.validate(instance=json.load(instance),schema=json.load(schema))
Improve performance of writing and serving JSON IIIF-objects¶
If you are saving the JSON files in your server there is usually no need to improve the performance. However, if you are planning to create the JSON IIIF objects (manifests, collection…) “on-the-fly” each time the user requests them or caching the results temporarily you might want to improve the writing speed of the manifest.
tests/performance/performance_test.py
offers a way to test the performance of your server in serving a manifest with 2000 canvas and 2000 annotations, and one of 4000 canvas and 40000 annotations. Feel free to do a pull request with the results of the test which are appended to the csv files in the same folder.
If the speed is not enough for your needs, you can try one of the
following actions or both:
1. Install orjson: pip install orjson and use
myIIIFobject.orjson_save()
or myIIIFobject.orjson_dumps()
` instead of myIIIFobject.json_save() or .json_dumps()
2. ⚠️Check the IIIF
JSON object once and then run optimized Python code using the -O
flag e.g. python -O 0001-mvm-image.py⚠️
orjson is a much faster parser compared to the standard json
module.
Important
The -O flag ⚠️removes all the assertions and most
of the helper classes⚠️. Hence you should use it with caution. One
strategy could be to check (without the flag) if the IIIF Object is valid
when you insert it in your digital library or you modify it and then
use the optimized version to serve it, without the need of using other
tools.
An Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz using pyIIIFpres can
produce 4000 canvas and 40000 annotations in 2.26 seconds, using orjson
in 1.19 seconds and with the optimization in 0.48 seconds.