Versus: A Framework for General Content-Based Comparisons

Peter Bajcsy; Antoine Vandecreme; Benjamin J. Long; Paul Khouri Saba; Joe Chalfoun; Luigi Marini; Devin Bonnie; Rob Kooper; Michal Ondrejcek; Kenton McHenry

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Versus: A Framework for General Content-Based Comparisons

Published

December 5, 2011

Author(s)

Peter Bajcsy, Antoine Vandecreme, Benjamin J. Long, Paul Khouri Saba, Joe Chalfoun, Luigi Marini, Devin Bonnie, Rob Kooper, Michal Ondrejcek, Kenton McHenry

Abstract

AbstractWe present a framework for the execution and dissemination of customizable content-based file comparison methods. Given digital objects such as files, database entries, or in-memory data structures, we are interested in establishing their proximity (i.e. similarity or dissimilarity) not based on their byte representation (i.e. file format or file system metadata on the file) but on the actual information contained within the files (text, images, 3D, video, audio, etc.). As a generalization of traditional content-based search and retrieval approaches, we propose a general piece of Cyberinfrastructure that can be used not only for text-based search but also for non-text, content-based comparison in general (e.g. duplicate file identification, detecting changes that occur to a files information over time, and ground truth data comparisons). The proposed framework abstracts these tasks by breaking comparisons into three reusable components: (1) the loading of digital contents to some type of content specific data structure; (2) the extraction of features and feature descriptors representing specific aspects of the contents of that data; and (3) the computation of a numeric content proximity of those two feature descriptors. We describe an implementation of this abstraction as a Java API and a RESTful (Representational State Transfer) service API. These represent both a set of tools to support the access and execution of content-based comparisons on local and distributed computational resources (e.g. desktop or cloud environment), as well as a library of methods focused on images, 3D models, text, and documents comprised of the three.

Proceedings Title

2012 Eighth IEEE International Conference on e-Science

Conference Dates

October 8-12, 2012

Conference Location

Chicago, IL

Conference Title

eScience 2012

Pub Type

Conferences

Download Paper

Local Download

Keywords

content based comparison, cyberinfrastructure, large data collections

Cloud computing and virtualization, Computational science and Data and informatics

Citation

Bajcsy, P. , Vandecreme, A. , Long, B. , Khouri, P. , Chalfoun, J. , Marini, L. , Bonnie, D. , Kooper, R. , Ondrejcek, M. and McHenry, K. (2011), Versus: A Framework for General Content-Based Comparisons, 2012 Eighth IEEE International Conference on e-Science, Chicago, IL, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=911821 (Accessed May 6, 2026)

Additional citation formats

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created December 5, 2011, Updated February 19, 2017

Was this page helpful?