A content object is any object that is not a collection. Content objects include documents, images, music, video, and other digital content. In FL-Islandora, there are three ways to create a new content object:
Content objects generally include four (or more) datastreams, such as MODS metadata, Dublin Core metadata, a text file called “RELS-EXT”, and a file stream containing the content file like a PDF or JPG. Some content objects include two, three or even more file streams. The RELS-EXT is a file required by Islandora that records important information about the object, like its name and to which collection(s) it belongs.
Objects ingested into a site must have the namespace prefix assigned to the site. The namespace is used as a prefix to the Fedora PID (system identifier) which has the format [namespace-prefix]:[name]. The [name] portion of the PID for content objects is assigned by the system and is always a sequentially assigned number. By policy, names for collection objects should be assigned by the operator creating the collection, and should always begin with an alphabetic character.
NOTE: Content objects within FL-Islandora cannot be batch modified, so please use extreme care in loading objects.
Objects created using batch import (the zip file importer), book batch, or offline ingest must have metadata submitted as part of the batch. All of these methods will accept valid MODS or MARCXML.
Note that an object "owner" is by default set to the user who created (ingested or submitted for ingest) the object. You may see a migration-assisted object with an Owner of 'Digitool Migration Assistant' or 'fedoraAdmin'. The owner of any object can be changed by those with sufficient permissions.
Each FL-Islandora site has its own lists of allowed IP ranges and its own Embargo policies (what to show when an unauthorized user tries to display an embargoed item, whether to receive notification of pending embargo releases).
What happens next depends upon the type of object you are creating:
Unitary Content Objects (single content item, such as a PDF file or single image)
Compound Content Objects (multiple content items connected together)
Paged Content Objects (complex view item, such as a book, serial, or newspaper)
Unitary content objects are complete in one primary datastream or file. There may be associated derivative files stored as datastreams, but there is only one primary file. Content models that support unitary content objects are: Basic Image, Large Image, PDF, Audio, Video, and Binary Object. Although the ingest process may vary slightly, all of these content types have essentially the same ingest workflow:
Compound objects are sets of two or more related objects, of any content type, that always display together. They are implemented in FL-Islandora following the Compound Object content model as a parent object consisting of metadata only connected to any associated child objects (regardless of content model type). Ironically, the child objects are created first.
Creating a compound object is a three-step process:
Child Object Pid/Label: Type the title or PID of the child object(s) to be part of the compound object. The data entry field will autocomplete, so you should be able to select the desired object after entering just a few characters.
Enter the title or PID for the first child object and click “Submit”. Repeat for every child object.
Paged objects are hierarchically organized content that consists of individual pages at the lowest level, like books and newspapers. These objects make use of multiple content models, similar to the Islandora Compound Object Content Model with parent objects consisting of metadata only.
With release 7.x-1.6 of Islandora there is the new ability to upload PDF objects to book parents or newspaper issues from which individual page image files can be extracted. In both cases the end result is a book or newspaper issue with individual page images.
NOTE: As of April 2019, there is a new, optional feature that sends uploaded .zip files of Book pages and Newspaper Issue pages to Offline Batch Ingest. This allows users to continue working in the GUI after the .zip file is uploaded, and queues the pages for loading. Loads can then be tracked via your institution's Offline Batch Ingest admin GUI. It is recommended that anyone using this zip page feature set-up the offline batch ingest feature by contacting firstname.lastname@example.org.
To create a book, navigate to the collection that the book will be part of. Click the "Manage" tab. The Overview screen will appear by default. Click “+ Add an object to this collection”. Select the Islandora Internet Archive Book Content Model. Complete the metadata for the book and click “Ingest”. When ingest is complete, the Internet Archive BookReader view will appear, showing a book with a title but no content:
Next you will need to add pages to the book parent object. Click the "Manage" tab, then click “Book”. The resulting screen will offer two options, “+ Add page” and “+ Add zipped pages”.
To add pages one at a time, click “+Add Page”. Find the page file (TIFF or JP2) and upload it. Click the “Ingest” button. When ingest is complete, you will see the single page display.
NOTE: this page view has a "Manage" tab, but adding pages is a function of the Book, not the Page.
To add another page you must return to the book level by clicking “Return to Book View”. On the Book View, click the "Manage" tab, then click “Book”. Now you can repeat the steps from clicking “+ Add Page” to add another page.
Uploading pages one by one can be tedious, so there is also a function to add a number of pages bundled together in a single zip file. To use this, at the Book level, click the "Manage" tab, then click “Book”, then click “+ Add Zipped Pages”.
Last Sequence Number: If there are no pages already ingested into the book, this number will default to “0”, otherwise it defaults to the count of pages already ingested into the book. Page numbering will start after the page number entered here. In this example, the first page ingested from the zipped page file will be numbered “2”.
Compressed Images File: Users can browse to locate the zip file of pages and upload it.
Select the language the text is written in from the pull-down (this will default to English so you may not need to adjust). You may want to leave the last sequence number alone (this will default to zero or the current last page of the book), unless there were pages missing from the book or there is some other reason to change the page numbering. Locate the zip file of pages and upload it. Click “Add files to book”.
When the ingest is complete, you will get an updated version of the same screen. Any errors encountered while loading the pages will appear at the top. The “Last sequence number” field will now be updated to include the count of the number of pages ingested into the book.
There is a "Book" training exercise zip file at the bottom of this area.
Newspapers make use of three Content Models: Newspaper (title level), Issue, and Page. Title-level newspaper objects must be created manually via the User Interface. After title-level objects are created, issues can be added to the title (parent) object and then pages can be loaded to the issue.
Creating a Newspaper Title (Parent) Record
To create a new newspaper, navigate to the collection the newspaper will be part of and click the "Manage" tab. The Overview screen will appear by default. Click “+ Add an object to this collection”. Select the Newspaper Content Model. Select a metadata edit form and create metadata at the title level. Note that “Type of resource” should be “text”, and “Issuance” should be “serial”.
To add an issue, go to the newspaper title record. (This will be your default location after creating a newspaper object. Click the "Manage" tab and click “Add issue”. Select a metadata form and fill it out.
Title: Include the date or enumeration of the issue. E.g. if the newspaper title is “The Globe” the issue title should be “The Globe, January 1, 1882” or “The Globe, v.1 no.2”.
Type of Resource: text
Issuance: single unit
Date Issued: yyyy-mm-dd. It is critical to enter Date Issued in this format to get the correct newspaper tree display.
Pages can be added to an issue either one page at a time, or by loading a .zip file containing page images for an entire issue.
Offline batch ingest provides an offline alternative to online ingest and online batch import via the FL-Islandora user interface. At this time, six content models are supported by offline ingest: Basic Image, Large Image, PDF, Book, Newspaper Issues, and Video. This frees up your online connection to FL-Islandora via the user interface for other work and extends FL-Islandora loading and processing capabilities by performing many load operations on the separate FL-Islandora load server.
Prerequisites: a new user must request an FTP account to the FL-Islandora load servers (test and production). Accounts are issued to individual users from an institution and are IP restricted. Please provide email@example.com with a list of individuals who will be using offline batch ingest at your institution, along with their IP addresses. Florida Library Services will create user accounts and also set up an offline batch ingest reporting user interface for your institution. The individual accounts/logins will provide you with access to only your institution's FTP directory on the FL-Islandora load server.
The offline batch ingest workflow is as follows:
Rules for creating packages for offline batch ingest:
Creating a METS File
The METS Editor (SobekCM) can be downloaded and installed and used to create a METS file for your Book and Newspaper Issue packages.
Examples of Packages
The following examples of package structures would all be valid packages (assuming of course that all files within them are valid). Note that there is no naming requirements for content filenames, but pages of newspaper issues without METS files will load in ASCII sort order so that they load in correct page order, e.g., p001.jpg through p234.jpg, etc.:
A Large Image
A Newspaper Issue
<mods xmlns="http://www.loc.gov/mods/v3" mods:xmlns=”http://www.loc.gov/mods/v3”> <mods:titleInfo> <mods:title>This is an example of a bad MODS record.</mods:title> </mods:titleInfo> etc.
<mods xmlns=”http://www.loc.gov/mods/v3”> <titleInfo> <title>This is an example of a good MODS record.</title> </titleInfo> etc.
<originInfo> <dateIssued encoding=”w3cdtf”>1881-05-07</dateIssued> </originInfo>
The manifest is an XML file containing instructions to the batch ingest process.
An example of a valid manifest file for a Large Image:
<?xml version="1.0" encoding="UTF-8"?> <manifest xmlns="info:flvc/manifest"> <contentModel>islandora:sp_large_image_cmodel</contentModel> <owningUser>Sally Staff</owningUser> <collection>fau:photos</collection> <owningInstitution>FAU</owningInstitution> </manifest>
Example of a Book object's manifest:
<?xml version="1.0" encoding="UTF-8"?> <manifest xmlns="info:flvc/manifest"> <contentModel>islandora:bookCModel</contentModel> <owningUser>Sally Staff</owningUser> <collection>ucf:floridabooks</collection> <owningInstitution>UCF</owningInstitution> </manifest>
Example of a Newspaper Issue object's manifest. Note that the collection element must contain the PID of the newspaper title object so that the loader can identify the newspaper title to which the issue is to be attached:
<?xml version="1.0" encoding="UTF-8"?> <manifest xmlns="info:flvc/manifest"> <contentModel>islandora:newspaperIssueCModel</contentModel> <owningUser>Jane Jones</owningUser> <collection>fscj:1234</collection> <owningInstitution>FSCJ</owningInstitution> </manifest>
The following elements are required in the manifest file:
The following is a complete list of elements permitted in the manifest.xml file:
To submit packages for loading via the offline batch ingest process:
To view the results of your load, point a browser at your site’s Ingest Reports interface. The URL will be: [institution’s islandora site code].admin.digital.flvc.org.
You will see a page that lists all materials loaded into your site by offline batch ingest:
You can filter the results list by date, load status, package name/ID, title, collection, or content type (content model).
By clicking on the Status link you’ll see the full load report that includes a direct link to the Islandora object, if the object was loaded. Otherwise you’ll see an error report and can expect that the referenced package has been moved to the /errors directory in your site’s FTP space.
In the upper right-hand corner of the Ingest Report page is a CSV download button that will download the search results.
Dealing with Errors
All packages that fail to load will be moved to your site’s /errors directory, in alphabetically coded sub-directories. You can download the package from there, correct it locally, and can resubmit it, or you can delete the problem package.
The zip file importer can be used to ingest a batch of objects at once. It can be used to load:
MODS metadata and content files
Content files without metadata
MODS metadata without content files
Objects ingested via the zip file importer have the operator name of the submitting operator as the owning user, and will default to “Active” or “Inactive” state accordingly.
The objects to be loaded must be zipped together into a single file of filetype .zip. All objects in the zip file must use the same content model and be intended for the same collection. It is recommended that you use an open source software such as 7-zip to create your Zip files for Islandora instead of the proprietary Microsoft NTFS compression available from a Windows context menu/right click.
Files to be associated with each other (e.g. MODS metadata with content files) are matched by filename, so they must have the same filename and different filetype extensions. The filetype extension for MODS files must be .xml and the filetype extension for full text to be indexed must be .txt. Because the Zip Importer will ingest metadata without content and content without metadata, you should take care that metadata and corresponding content files have exact matching names -- a typo in the filename will cause the two files to be ingested separately.
For example, Large Image objects with metadata might be named:
file1.jp2 file1.xml file2.jp2 file2.xml
PDF files with metadata and full text for indexing might be named:
1443587.pdf 1443587.xml 1443587.txt 1439_2.pdf 1439_2.xml 1439_2.txt
A single zip file can also contain a mixture of pairs of metadata and content files, standalone metadata, and/or standalone content files. A zip file containing:
program_1.pdf program_1.xml 4433256.pdf solo.xml
will create 3 objects: one PDF with metadata, one standalone PDF, and one standalone MODS file.
NOTE: the ZIP File Loader allows for upload/creation of objects in only one Content Model per .zip file
The MODS file provided should be valid MODS. Florida Library Services strongly recommends validation of MODS files prior to loading with the Zip Importer, as the Zip Importer does not perform validation against the MODS schema during loading. NOTE: The ExceltoMODS Transformer, http://exceltomods.flvc.org, validates against the MODS schema during transformation of Excel spreadsheets into MODS, and so MODS prepared in this way has been validated.
Minimum requirements are that it must include:
a unique IID in the element <identifier type="IID">
a valid owning institution code in the element <extension> <flvc:flvc> <flvc:owningInstitution>
a title in the element <titleInfo><title>
Submitting Institution (<submittingInstitution>) and Other Logo (<otherLogo>) can also be put in the <flvc> extension.
The namespace for the <flvc> extension must be included in the MODS header information:
The example below shows a valid minimal MODS record for the Zip File Importer:
<?xml version="1.0" encoding="UTF-8"?> <mods xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:flvc="info:flvc/manifest/v1" xmlns:xlink="http://www.w3.org/1999/xlink” xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-4.xsd” version="3.4"> <extension> <flvc:flvc> <flvc:owningInstitution>FSU</flvc:owningInstitution> </flvc:flvc> </extension> <titleInfo> <title>Strawberries</title> </titleInfo> <identifier type="IID">FS3518756</identifier> </mods>
NOTE the supplied MODS record must NOT use namespace prefixes on the MODS elements. I.e., do NOT use the namespace definition mods:xmlns=”....” and do NOT create a record like the below:
<mods xmlns="http://www.loc.gov/mods/v3" mods:xmlns=”http://www.loc.gov/mods/v3”> <mods:titleInfo> <mods:title>This is an example of a bad MODS record.</mods:title> </mods:titleInfo> etc.
If any of these requirements are not met, the supplied metadata will be ignored, and the Zip File Importer will create a skeleton MODS record that contains only the filename as title:
<mods xmlns="http://www.loc.gov/mods/v3"> <titleInfo> <title>[filename]</title> </titleInfo> </mods>
The skeleton record will not be sent to Mango but it will display in Islandora. The first time an operator tries to update the skeleton record online, the MODS Forms will force the creation of required fields.
Navigate to the collection to which the objects should be added. Click "Manage". This should default to the “Overview” screen.
Click "Collection" to get the collection management screen. Click "+ Batch import objects".
Take the default importer, “ZIP File Importer”. Click "Next".
NOTE: Although the Binary Object Content Model appears when you select the ZIP File Importer, to batch load Binary Objects you must use the Binary Object Zip Importer to batch load Binary Objects (the ZIP File Importer will not load the content files).
Fill out the form. Find the zip file to import. Click the content model the import should use.
Select the default namespace, which should be correct. Click "Import".
This is a useful feature for institutions that want to load quantities of objects and then add metadata interactively online.
Content files can be imported without metadata. Any content file in a zip file that has no matching metadata file will be ingested and a minimal MODS record will be created with only the <titleInfo><title> element supplied from the filename of the file.
E.g. if a standalone file named “Flavorcrest.jpg” is imported, this MODS record will be created:
<mods xmlns="http://www.loc.gov/mods/v3"> <titleInfo> <title>Flavorcrest</title> </titleInfo> </mods>
Because the metadata has no IID identifier or owning institution code, these required fields will have to be added the first time the record is edited online.
Metadata files can be imported without corresponding content files. The importer will generate warning messages that derivative files could not be created, according to the content model associated with the import. These messages can be ignored. This option may be used when an institution's workflow is such that one staff person creates and loads metadata only, and after the metadata has been finalized another staff person uploads and adds content datastreams.
After the content of a zip file has been successfully imported, the operator will get a response screen with the message:
Batch complete! View/download simple results or see the watchdog log for details.
The “simple results” link provides a list of all objects ingested with their title and PID, and a link to the content in Islandora.
info: Ingested "Viking Motel Acquisition, June 1988" as islandora:1858. Link: islandora:1858. info: Ingested "Naésa zemlja" as islandora:1859. Link: islandora:1859. info: Ingested "Flavorcrest" as islandora:1860. Link: islandora:1860. info: Ingested "Peaches" as islandora:1861. Link: islandora:1861.
The “watchdog log” link provides a view of log messages associated with the import. The entire log can be also be viewed from the Administrative menu / Reports / Recent log messages.
Batch Import to Offline Ingest (Books, Newspaper Issues, and Videos)
To a large extent these new features are transparent to users, because the initial steps to either add a .zip file of Book pages or a .zip file of Newspaper Issue pages, or a .zip file containing one or more video files and their associated MODS files remain the same. The differences appear after the .zip files are uploaded:
After uploading the .zip file using one of the current methods, the new code (if enabled for your site) creates an Offline Batch Ingest package from the .zip file and
moves the package to your site's Offline Batch Ingest load queue for batch loading.
The GUI provides a link to your site's Offline Batch Ingest administrative interface (http://site code.admin.flvc.org) where you can track the progress of the load.
Packages that have been successfully transferred are noted as Status "queued". After processing you will see the regular "success", "warning" or "error" status.
1. As usual, first create a book title object and edit its metadata.
2. To add a .zip file of pages, click Manage -> Book -> +Add zipped pages from within the parent Book object. You can load a .zip file of pages to a newly created Book parent object, or you can add pages to an existing Book parent object with existing book pages. Pages will be added to the end of the book.
3. Upload the .zip file of pages, and click "Add files"
4. With the new feature, your .zip file will upload and be passed to Offline Batch Ingest, and you'll receive a message that indicates that your pages will be loaded via offline batch ingest:
5. You'll see in the Offline Batch Ingest admin GUI that your package of pages is queued for loading. Note that the "title" of the load will be the IID of the book object to which pages are being added, along with the date and timestamp of the load (not the title of the book):
6. Once the pages load you'll see a "success" status and details about the load, including a link to the book parent object and a list of page PIDs and links:
Content objects can be moved from one collection to another or shared with another collection. This can be done at either the object level or the collection level. Moving or sharing from the objects itself is useful if you have only one or few objects to move, or if you are already on the object for another reason, for example, reviewing its metadata.
Move or share a single object
Navigate to the object you want to move or share. Click the "Manage" tab. The default Overview display gives you prompts:
1) from the object’s "Manage" tab, 2) from the collection’s "Manage" tab.
Moving objects from a collection’s Manage tab is useful if you want to move all or several members of one collection to another collection. For example, say you have a collection called “Maps” containing maps of North and South Carolina, and you want to break it up into two collections: “North Carolina Maps” and “South Carolina Maps”. You could create a new collection called “South Carolina Maps” and move all the South Carolina maps from the Maps collection into it. Then you could rename the original collection “Maps” (which now contains only North Carolina maps) to “North Carolina Maps”.
Navigate to the collection whose members you want to move. Click the "Manage" tab.
Click “Collection” on the top menu bar and then “Migrate Members” on the left-hand sidebar. You will get the migration form.
Migrate members to collection: this is a pull-down menu with the names of all collections known to your site. Select the collection you want to migrate objects into.
To migrate all objects in the collection, click the box next to "LABEL" to select all and then click the button “Migrate All Objects” at the bottom of the screen.
To migrate selected objects, use the checklist of the titles of all objects in the current collection. You can cherry-pick objects or select a screen at a time by clicking the box before “LABEL”. Note that this won’t select all objects in the collection, only the ones listed on that page. Then click “Migrate Selected Objects”.
Migrated objects will be detached from the current collection and moved to the selected collection.
Sharing Objects in a Collection
Sometimes you want an object to appear in multiple collections at the same time. For example, you might want a historical state map to appear in both your “Maps” collection and your “State History” collection. You can make an object a member of two or more collections by ingesting it into one collection and then sharing it with the other collections. Instructions on how to share one or more objects from a collection’s "Manage" tab are given here. There are two ways to share a content object with another collection:
1) from the object’s "Manage" tab, 2) from the collection’s "Manage" tab.
Navigate to the collection containing the objects you want to share. Click "Manage".
Select “Collection” from the top menu bar and “Share Members” from the left-hand sidebar.
The Share members function is implemented like the migrate function. Select the collection you want to share members with from the “Share members with collection” pull-down. To share all objects in the collection, click the button “Share All Objects” at the bottom of the screen. To share selected objects, use the checklist of the titles of all objects in the current collection. Select the member(s) you want to share from the checkbox list. To select all listed on a page, click the checkbox in front of “LABEL”. Remember this will select only those titles listed on the page, not all members of the collection. Click “Share Selected Objects”. They will now appear in both collections.
Content objects can be shared with collections on other sites. In particular, objects from institutional collections can be shared with PALMM using the instructions above.
Collection(s) with a Collection
1) Make sure the collection you are creating a new collection is set-up with the "Islandora Collection Content Model (islandora:collectionCModel)". To check this, browse to the collection you will create a collection in, then click "Manage", then click "Collection." Here on the left hand side you will see "Manage collection policy." If the box for "Islandora Collection Content Model (islandora:collectionCModel)" is selected then the collection is set-up to accept this content type and you can proceed to the next step.
2) If you are not already within the appropriate collection, browse to the collection that you will create your new collection. Click the "Manage" tab, then click "Add an object to this Collection". As a reminder, Islandora labels both content items and collections as "objects."
3) Under "Collection PID", enter a PID for your new collection. The Collection's PID has to start with your institution's namespace. The namespace is the part of your site's URL that comes before ".digital.flvc.org". (For example, https://fau.digital.flvc.org has the namespace fau and https://ucf.digital.flvc.org has the namespace ucf .) The namespace has to be lowercase. If you enter a wrong namespace, then you get an error message on the very last screen. If you don't enter a "Collection PID" then Islandora will automatically assign a random number, which is not recommended practices and impacts search engine optimization.
4) Uncheck the box for "Inherit collection policy". This opens up a bunch of checkbox options. Pick the ones for the content that you plan to upload into this new collection. If you select the wrong ones now you can go back and change this setting later.
5) Click "Next".
6) You can ignore MARCXML file. Click "Next" without doing anything on the MARCXML screen.
7) Enter a "Collection Title". This will show up to people browsing the site. You can also change the tilte at a later time.
8) Click "Ingest".
Reorder Child Objects for Compound Content
NOTE: The Inactive Queue function (Simple Workflow Module) has not been enabled in production. Currently, an Inactive Queue is shared by all sites, so institutional sites do not have their own Inactive Queue.
The “inactive queue” allows reviewers to find these suppressed (inactive) items more easily. Staff with the authority to review the inactive queue will see a link on the left-hand sidebar to “Manage inactive objects”. Clicking the link provides a view of the inactive queue.
At this time, only the title of inactive objects displays in the inactive queue. (A suggested enhancement will expand the data available). NOTE: both content objects and collection objects can be inactive, and both will display by title (label) in the inactive queue.
Click the title to display the object. Click “Manage” to display the Properties page for the object. The state can be changed to “Active” from the Properties page.
Adding, Replacing and Deleting Datastreams
Normally datastreams are added to an object at the time of ingest, and you manage them when necessary through other functions. However, occasionally you might want to add, replace or delete a datastream. For example, say a PDF object has been created and full text of the PDF was extracted by the system and stored as the FULL_TEXT datastream. The extracted text is not very good and you want to improve it. To do this you would have to
These can all be done from the Datastreams screen. Navigate to the object in question and click the "Manage" tab. Click “Datastreams”. The screen will list all datastreams for the object. To download a datastream, click the “download” link next to the name of that datastream. To delete a datastream, click the “delete” link. To upload a datastream, click “+ Add a datastream” (and be careful to know your "Datastream ID" since this is a controlled field that determines how the datastream will be used).
Add a Datastream: Provide the name of the Datastream as defined by the content model. If you are replacing a datastream named “FULL_TEXT” the name you provide here must match that exactly.
Datastream Label: This is the label that displays after the Datastream name whenever datastreams are listed. This can be anything you want but it is best to use the same label as the datastream you are replacing.
Upload Document: This allows you to find and upload the new datastream. Click “Add Datastream”.
Running OCR on Book and Newspaper pages
How to create a PDF from a Book