Skip to content

Glossary

  • Metadata: they are descriptive elements that identify a resource such as a document. They include title, abstract, author, keywords and more.

  • Optical Character Recognition [OCR]: is the conversion of images of text (printed or handwritten) into computer-encoded text. The source may be a scanned document, a photo or an image with text superimposed.

  • Reading order: sometimes the words inside a PDF document are not sorted in the same order that we expect them to read (i.e.: reading order). This is due to the fact that the PDF format has been developed mainly as a printable means based on PostScript language.

  • Snippet: (noun.) a small piece or brief extract

  • Unique Document Identifier: some PDF documents contain a document ID number, this is common for research papers and official publications. Through an online search Zenreader is are able to identify the document and extract the bibliography.