Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

ITL Summer Undergraduate Research Fellowship (SURF) Projects 2022

Multimodal image registration for fluorescence guided surgery


Fluorescence guided surgery is an important tool for surgeons to accurately identify and remove cancerous tumors. This work is motivated by the spatial misalignment of real-time streaming brightfield images and fluorescent images with tumor indications acquired by a fluorescence guided hand-held imaging system during a head and neck surgery. The spatial misalignment of brightfield and fluorescent images pose challenges for a surgeon who is deciding where to remove tumor tissue with significant consequences for a patient.

The problem of spatially aligning (registering) two multimodal images involves designing an automated method for estimating registration transformation parameters. The challenges include (a) achieving high spatial accuracy for tumor tissue removal, (b) overcoming limitations of existing algorithms that are optimized for monomodal images and assume many spatial features in both modalities and (c) decoupling the intermodality transformation and registration tasks in multimodal algorithms .

We approached the multimodal registration problem by (a) creating ground truth data by manually registering paired images (b) evaluating the effectiveness of traditional image registration algorithms such as SIFT, and (c) training and testing an unsupervised generative adversarial network (GAN) called NeMAR. The NeMAR method consists of a spatial transformation (registration) network, intermodality translation network, and discriminator network.

By testing different configurations of NeMAR with artificially generated training images, training iterations, and network types, we found the optimal NeMAR configuration with respect to our ground truth registered images. The method’s accuracy increases with the number of input image pairs, but remains about the same for over 200 epochs of training.

In summary, while machine learning methods show promise in multimodal registration tasks, a robust

GAN-based method would require a large training dataset sampling from a variety of surgical environments. Future research will compare a supervised machine learning approach to our unsupervised GAN-based approach with more training data.

Enhanced Viewing of 3D Objects Scanned using Photogrammetry


Most of a museum's collection is held in storage due to a lack of space for public displays. One solution for displaying these stored artifacts is to create 3D models of them. This can be done using photogrammetry, a technique for creating 3D virtual models of objects by taking many pictures of an object from different angles and using software that inputs camera images in order to reconstruct a virtual model mesh. These models are then saved as glTF files. glTF (GL Transmission Format) is a file format used to store 3D models and scenes, and is becoming an ISO (International Organization for Standardization) standard.

This study focuses on features that can be implemented to improve the user experience of viewing imported glTF models. Implemented features are presented to the user as a series of tools that can be interacted with through an on-screen HUD (heads-up display). Some of these tools include a light that follows the mouse cursor to brighten a model and annotations for describing individual parts of a model. Annotations are presented to the user through a separate HUD window that appears when clicking on an object. All tools were developed in AFRAME, a web framework that uses HTML and Javascript to create 3D scenes which are viewable through a web browser and virtual reality devices.

Visualizing Cybersecurity Vulnerabilities and their Role in Recent Cyber Attacks


The influx of recent large-scale cyber-attacks has created the need to understand how known cybersecurity vulnerabilities impact the integrity, availability, and confidentiality of network infrastructures across all business and government sectors. To aid in cybersecurity awareness efforts the NIST and its team of researchers are working to furnish the cybersecurity community with well-informed datasets/metrics. The goal if this project is to capture the process of enhancing cybersecurity related data through a wide range visualization software and resources. Utilizing the NIST’s National Vulnerability Database (NVD), a breakdown of verified cyber vulnerabilities with each vulnerability’s criticality score and influencing factors, we can identify some of the most common types of network/system vulnerabilities. Together with additional open-source resources such as CISA, the U.S. Department of Health and Human Services Office for Civil Rights, and others we can establish multipoint connections and create approximate reference relationships between the NVD’s vulnerability data and a significant number of reported cybersecurity incidents. From these connections we are then able to create visualizations that depict the correlations and possible influencing factors between many of the published vulnerabilities and recent incidents/breaches. By utilizing multiple data visualization services like PowerBI, Splunk, and ElasticSearch we can then create unique visualizations and compare similar findings from across the different services to validate the results. Once we are able to complete this process, we can then apply a method for tagging the data that will assist information security personnel and developers in determining how to prioritize implementing patches for these vulnerable systems.

Geometric augmentations to file identifiers in file system forensics


Digital forensics is a process that is key to examining and interpreting data in cyber related investigations. File system forensics makes up a significant portion of digital forensics as it is logically sorting through hard drive storage to determine creations, deletions, and other data essential to event reconstruction.

Important to the functionality of file systems is the principle of namespace uniqueness, which uses file paths and names as identifiers that can distinguish file objects. In the world of digital forensics, there are several libraries used, one being, The Sleuth Kit (TSK). Within it is a command, 'fiwalk', whose purpose is to convert raw disk images' metadata into extensible markup language (XML) but does not populate with the guarantee of namespace uniqueness due to its reporting of unallocated ("deleted") files. This reporting of unallocated files means that file names cannot be relied on as identifiers. Considering the need to review unallocated files, logical code changes that focus on incorporating new identifiers for file objects are made necessary. This work evaluates a practice that identifies: the start of an index node (a file's attributes), the start of the directory entry, and the start of the file's content, producing a

three-dimensional address for each file object. Subsequently, reported results from a 2012 paper that contain a measurement discrepancy will be corrected and additionally, this research will enable better cross-tool comparison.

Addressing the Causes and Consequences of AI Failures


Artificial Intelligence (AI) has become increasingly prevalent in nearly all areas of life. It now controls our thermostats, drives our cars, produces our electronics, recognizes our faces, evaluates our resumes, predicts our purchases, and so much more. But what happens when AI systems fail, and how can we learn from these failures to prevent such incidents in the future? Our project proposes a framework for characterizing these AI failure incidents, and provides a structured way of documenting them in an online repository.

This repository is designed to address some key concerns. First, it will provide all known, verifiable information about the AI failure incidents in a convenient, searchable manner to allow users to discover and learn about these incidents with some level of technical depth. Secondly, it will allow users to report incidents as new ones are discovered, ensuring that the data remains up to date. Lastly, it will address the issue that Machine-Learning based AI systems tend to be black boxes. While such systems often succeed in achieving remarkable levels of accuracy, they rarely provide much understanding of their decision making process or the factors that influence its decisions. Our proposed characterization of the AI-failures will help us gather information that could shed light on this aspect.

Regular software vulnerability documentation efforts, such as NIST's National Vulnerability Database (NVD), require some knowledge about the inner workings of the code to be analyzed effectively. Such details are a lot harder to garner when it comes to AI/ML failures. Our proposed framework - Failures of Artificial Intelligence Learning Systems (FAILS) - captures the causes for the incident, the sources of weaknesses involved, and a measure of impact that the failure caused without needing to know the details of the code. In short, all the information needed to ensure it doesn’t happen again.

High-dimensional consensus mass spectra comparison


Mass spectrometry (MS) is an analytical chemistry technique for analyzing compounds. It provides a signature--called a mass spectrum--that can be used for compound discrimination. That signature is a scatterplot of charged fragments of the substance. One popular application is forensics chemistry, where drug chemists are trying to determine whether seized evidence is an illicit drug.

The traditional method for discriminating mass spectra in forensic chemistry is to bin the scatterplot into a vector (essentially a histogram) and take the cosine similarity between the vectors. While generally effective, this method can occasionally lead to misidentifications. We recently developed two novel methods for incorporating measurement variability when comparing mass spectra to limit the likelihood of misclassifications.

The first method works by binning the mass spectra--identical to the traditional approach--but then uses the mean and standard deviations of the bins across replicate measurements to form a

summary-statistic vector. The second method works by taking the n highest y-valued points in the mass spectra and finding the mean and standard deviation across replicate measurements of that value and using the statistics to represent the compound. We use these summary-statistics as a more informative way to compare compounds.

We have implemented these methods in C and performed preliminary evaluation using experimental data collected with two different types of mass spectrometers. We have found good performance in the discrimination of current drugs of interest (methamphetamine vs phentermine, nicotinamide vs isonicotinamide) and are currently evaluating the performance of these new methods across a larger test set of mass spectra that are difficult to discriminate by the traditional method, including applications outside of seized drugs.

Exploring Graph Analytics on Nisaba GPU cluster with cuGraph


Graph algorithms and Graph analytics are designed for manipulating and analyzing graph types of data to determine the relationships between graph objects and the structural characteristics of a graph as a whole. They have been adopted and used heavily in fields such as social networking, route optimization, fraud detection, and so on. A major issue with modern graph analytics is that it is usually challenging to perform the algorithms quickly or with high computational efficiency at a large scale. Graphics Processing Units (GPUs) can be utilized to accelerate graph data analysis and machine learning. Recently, NVIDIA produced the open-source graph analytics library cuGraph, which operates directly on GPU DataFrames and provides a collection of GPU accelerated graph algorithms with NetworkX-like API that can be treated as an efficient graph analytics solution for Python users.

The purpose of this project is to evaluate benchmark graph analysis algorithms on NIST’s Nisaba GPU cluster and compare the quantitative performance of cuGraph with other CPU-based graph analysis tools, such as NetworkX and NetworKit. Both synthetic and real world datasets are employed to benchmark the common network analysis algorithms among six categories, namely Katz for centrality analysis, Louvain for community detection, Breadth-First-Search (BFS) and Single-Source-Shortest-Path (SSSP) for graph traversal, Weakly Connected Components for component detection, and PageRank for link analysis.

Furthermore, cuGraph supports multi-GPU and multi-node operations (MNMG) in conjunction with Dask (Dask cuGraph). Dask cuGraph was also benchmarked alongside cuGraph and NetworkX. Through our reproducible experimentation, we were able to identify large performance increases when using the cuGraph library compared to both NetworkX and NetworKit, as well as unprecedented scalability of graph analytics using multiple GPUs. A GitLab repository was created to allow future users to test the cuGraph benchmarks with their own specifications and provide implementation examples for using the library.

Dynamic Access Review and Control Implementation and Enforcement (DARCIE)


As cloud services are becoming more widely adopted, the amount of data available to members of an organization is vastly increasing along with the risk of data breaches.This project develops an access control mechanism that dynamically reviews, implements and enforces access control policies in real-time. The mechanism ensures granularity of control through privilege access management, allowing the system to control user access to resources. The access control mechanism enforces zero-trust policies so that users are continuously authenticated and are granted or denied access to the sensitive information based on their geolocation, organization’s network availability and historic pattern of accessed resources.

Our proof-of-concept system uses two devices to simulate a user accessing local or cloud data. A virtual machine acts as the end user's device and another device acts as a router that simulates different geolocations from where data in the cloud is accessed. The system demonstrates a dynamically changing policy generated by the state of a sensor and enforced by the kernel using Security-Enhanced Linux (SELinux). In this case, the demo system limits a user’s access to some system resources based on the device’s connection to an access point, simulating a dynamically generated access policy based on geographic location. Future work will focus on access control policy review and control implementation and enforcement rules also derived from the user's historic pattern of accessing the resources of interest.

Benchmarking Queries from Zeno against FCPW


NIST has developed a software called Zeno, which estimates material properties from a geometric model of a particle of said material. One of the main computational tasks Zeno performs is to compute the closest point on the model to a query point. In addition, a newly proposed algorithm in Zeno would need to determine whether a part of a geometric model is contained within another. Currently, Zeno uses an internally-developed library to compute its closest point queries. However, using another open-source library may prove to be more optimal. In this project, we benchmark the closest point and contains queries performed by the current Zeno library against those performed by the “Fastest Closest Points in the West” (FCPW) library. The results of these benchmarks will help us decide whether or not the internally-developed Zeno library should be replaced with the FCPW library when implementing the new Zeno algorithm.

To obtain the benchmarks, we created C++ programs using either library. Users can specify a .obj file for the program to construct its geometric model, a query type (either the closest point or contains query), and a number of random query trials to run. The programs will time how long it takes to construct the geometric model (preprocessing time) and how long it takes to compute all the query trials. We then used Python scripts to calculate benchmarking statistics for different .obj files, query types, and trial runs.

These statistics were plotted using various double bar graphs to help visualize patterns and directly compare each library's preprocessing and query times. Early tests suggest that the FCPW library is more efficient for a larger number of trial runs and is less error-prone than its existing counterpart. However, through more in-depth testing and analysis, we will be able to determine whether the FCPW library will be optimal for Zeno’s next implementation.

Creating an Algorithm for Searching RNGs to Link with Test Results


Random number generators (RNGs) are often used in many aspects of everyday life from simulation and decision making to video games and other recreational activities. For a category of objects used so often, there must be a reliable method to test the quality of individual objects in that category. One of the most popular methods to test RNGs today is through a software library known as TestU01. Unfortunately, despite being effective at testing the quality of RNGs, TestU01 is expensive to run with the biggest test battery, BigCrush, consistently taking multiple CPU hours to test one RNG, which will inevitably take more wallclock hours. The original task was to research and figure out how to store RNGs and their TestU01 test results in a database such that they would be searchable, but figuring out a working algorithm to make said RNGs easily searchable ended up being so big that it turned into a project of its own.

Initially, a lot of time was spent on reading about RNGs and experimenting with the TestU01 software library in order to gain an understanding of TestU01 and the relevant RNGs. While becoming acquainted with them, we were also thinking of ideas as to how we could classify RNGs such that they would be searchable. Many potential algorithms were thought of, but the algorithm eventually proposed contains features from multiple of the potential algorithms we came up with along the way. It ended up being complicated to explain, but it should be relatively easy to use. This algorithm will likely be used in the database that the original project idea was supposed to create. However, it will also be usable in other contexts as long as they involve TestU01.

Interactive Online Histogram-Based Visualization of AI Model Fingerprints


Previously, NIST has generated hundreds of thousands of artificial intelligence (AI) models for the TrojAI Challenge focused on detecting poisoned (trojaned) AI models. The main motivation for this project is to support discoveries/analyses of relationships between various clean and poisoned AI models by measuring their model utilization and relating it to Trojan characteristics.

In order to draw connections between AI models, the problem lies in creating interactive and traceable histograms that allow researchers to group AI models according to their characteristics, select pairs of AI models to perform qualitative/quantitative comparisons, share and discuss AI model comparisons remotely. Challenges include: interactivity over thousands of data points, traceability of histogram contributing points (AI utilization fingerprints) to their training images, and reusability of existing libraries and of the visualization prototype.

Our approach is based on the D3 JavaScript Library and Papa Parse CSV parser followed by the design of interactive, traceable, and reusable histograms. Histograms are dynamically created based on AI model attributes, including architecture name, predicted classes, Trojan triggers, and measurement probes. By selecting two contributing data points to a histogram bin, a side-by-side comparison of two AI model utilization fingerprints is enabled to quantify AI model similarities.

The resulting visualization presents a histogram of AI model utilization fingerprints with drop-down menus to allow users to select attributes for binning. Interactive images in histogram bins can be selected, new comparisons of utilization values are rendered, and buttons can trigger computations of distribution statistics.

Implementing Real Time Constraints in Hedgehog API


An operating system (OS) is system software. Among its various capabilities, the OS can manage multiple threads and rapidly switch between their executions. A real time operating system (RTOS) provides more fine grained control over multithread behavior, allowing for a deterministic response and guaranteed execution time. For example, threads with higher priority values in RTOS would be guaranteed to run over threads with lower priorities.

The ability for RTOS to provide guaranteed real-time responses is significant especially for jobs needing consistent responses within a time constraint, such as monitoring a metal additive manufacturing process in real time by keeping up with data collected from a high-speed camera.

Over the past several years, NIST has been developing a C++ library called Hedgehog, which creates task graphs for algorithms to obtain performance across CPUs and multiple co-processors. The library relies on the OS to schedule its threads and provides no real time guarantees.

The focus of this research is to extend Hedgehog to provide access to real time priorities and scheduling algorithms, so that applications utilizing Hedgehog can be more deterministic when launched on an RTOS. In this presentation, we will present the implementation efforts to add the real time capabilities into Hedgehog, and the associated performance costs. To evaluate the performance, we have implemented two algorithms; (1) the Hadamard product and (2) Matrix multiplication. We will explore the performance behaviors with and without real time constraints of these algorithms by varying priorities and thread configurations within the algorithms.

Translating Mathematica Source Code to a Presentable LaTeX Format


Mathematica is a powerful programming language that is often used to handle and process mathematical data and equations. Mathematica is powered by the Wolfram Language, enabling it to define, display, and calculate essentially any level of mathematics, namely hypergeometric series in this use case. While Mathematica is well suited to manipulating, defining, and calculating these series, it is often very difficult to read and present longer equations. Through utilizing the programming language Perl, string analysis, regular expressions, and the Wolfram Engine, provided Mathematica source code is translated into the markup language LaTeX. The result is a much more user-friendly and discernible view of the hypergeometric series and other expressions contained within, and the ability to export these results easily. Translating Mathematica source into LaTeX allows for the intense computational power of Mathematica to be combined with the compatibility and readability provided by LaTeX to display the results.

Scientific Reproducibility of AI Trojan Detector Results


AI Trojans are malicious and intentional attacks that change the behavior of an AI by inserting hidden classes. To motivate research into Trojan detectors, NIST administered the TrojAI competition, where teams submit algorithms that detect Trojan AI models. The detector algorithms are known to output slightly different results across systems. These differences are problematic for scientific study of the algorithms because it means that results aren’t reproducible. This problem was the motivation for my NIST SURF project in which my mentor, Derek Juba, and I researched how algorithms submitted to the TrojAI competition behave when run in different environments. Submitted algorithms are containerized using Singularity which allows them to be easily run on broad range of machines. We tried to test the algorithms on as many combinations of software and hardware as possible (CPU core count, GPU drivers, etc.) in order to deduce potential causes of differing results.

We theorized that one of the main reasons for differences in the results across systems was changes in the orders in which floating point arithmetic operations were being performed. With this in mind, we attempted to quantify the uncertainty resulting from the choice of system without running the container on different systems. We simulated different orders of operations by tweaking the weights and biases of an AI model by a small amount. We used multiple random samples of such tweaks to find the variance we can expect in results if someone were to run an algorithm on a given model across different machines. Early analysis of the data suggests that results produced on other machines agreed with the variance we predicted with our tweaks and that the statistical distributions of tweaked models are largely reproducible across machines. Additionally, we propose that that the variance of the tweaked distributions can be used to score the confidence of detector algorithms.

Multimodal Fusion with Modality-Specific Factors for IEMOCAP dataset


In the scope of human-computer interaction, technology that can quickly analyze and identify emotion from varying data sources, is a coveted development. Potential applications of emotion recognition span from healthcare to gaming, only increasing demand for methods with efficient analysis and identification. Humans convey emotion through various mediums, most common of which are speech, facial expression, body language, etc. Emotion recognition technology frameworks are built upon foundational fusion methods, which synthesize various data modalities into features, utilized by prediction algorithms. This work mainly focuses on processing speech, text, and video data, and extracting the features from multiple modalities to develop a fusion model for emotion recognition tasks. We consider the IEMOCAP benchmark dataset by the processing of spliced data from modalities which includes features from audio data, video data, and embeddings from text data. These three modalities were processed for multimodal representations to recognize human emotions.

Making TRECVID Results More Accessible and Coherent


NIST has run the TREC Video Retrieval Evaluation (TRECVID) program since 2001, allowing institutions to evaluate how successful their systems are at retrieving video content from textual queries. As the results of these evaluations were simply sent back to the submitting institution(s), discussed at the annual TRECVID workshop, and only reported in published papers, there was no another means for teams or the public to examine the results. The website also enables the displaying of data in more organized and visually appealing ways, such as playing the video results, corresponding to the tested queries, based on different result conditions across participating teams.

The data from these evaluations was also stored locally, with minimal organization, making it difficult to perform many statistical analyses. Building a comprehensive web interface with a suitable relational database to house the TRECVID result information was the clear solution to the problem. By developing an easy-to-use website, the information not only becomes more easily accessible to participating institutions, but it allows them to compare their tools across like systems, and over time. The website was developed with a focus on simplicity and maintainability, while also striving to remain lightweight. All data is displayed in simple tables, with a user interface that allows for easy navigation and finding of important data points with visualized results.

Term and Relation Extraction in Mathematical Texts


There exist a variety of Natural Language Processing tools for term and relation extraction. Examples include Parmenides, a framework that applies structured and normalized terms to represent natural language, as well as DyGIE++, a deep learning system for entity and relation extraction. However, while these tools may be effective in extracting terms from scientific texts, their performance is less substantial with mathematical texts.

The two tools have previously been tested on their ability to extract terms from a collection of abstracts in the Theory and Application of Categories (TAC) journal. Parmenides extracted many valid mathematical terms, however it also extracted several times as many non-term phrases. We now hypothesize that term candidates that are part of relations, that is, subject-verb-object patterns, are more likely to be terms.

Thus, a filter that removes words that cannot be found in relations reduces false-positives generated by the Parmenides term extractor.

In the case of DyGIE++, the model was retrained on TAC abstracts using author provided keywords as training data. Since the model was trained on more domain specific text, it performed stronger than the default model.

These measures increased the precision and recall of both tools by a noticable margin. In future research, we will utilize this term extraction for the creation of comprehensive knowledge graphs for mathematical domains. Further, the relations extracted by Parmenides and DyGIE++ can be employed for the evaluation of these knowledge graphs.

Artificial Intelligence-based texture analysis


Texture analysis is ubiquitous, and it finds application in both biomedical and nanomaterial research. The ability to address it in an automated fashion is greatly beneficial. However, in most cases, visual analysis and custom-tailored approaches are employed. Convolutional neural networks (CNNs) represent a viable approach to characterize image texture accurately, and in particular properties that humans can detect: directionality and granularity.

NIST researchers have been addressing AI-controlled texture analysis for years, however, they have only used synthetic data to train Artificial intelligence, not real-life data. To further advance the CNNs and our AI as a whole, we need to change the testing data to real-life images. The only barrier is that there is no efficient software allowing users to annotate real-life images to be then used for testing.

Another contribution of the GUI I created is associated with a step forward my NIST mentors are envisioning on this project. Basically, the software will enable the creation of a public database of annotated texture images that will be globally available to other scientists. Images annotated using our software will be uploaded to a public database where others can view, source, and use it. To the best of our knowledge, there is no global database that contains this information.

This would not only help researchers around the world train AIs but help advance machine learning texture analysis as a whole.

Evaluating the Implementation of NIST SP 800-181 in Cybersecurity-Related Job Descriptions


With technology and data science becoming so prominent in society, it’s becoming increasingly imperative that companies and organizations protect themselves from malicious cybersecurity threats. However, in the United States alone, there are over 700,000 unfilled cybersecurity positions. The National Initiative for Cybersecurity Education (NICE) created the NICE Framework (NIST SP 800-181) to provide a set of building blocks for describing the tasks, knowledge, and skills that are needed to perform cybersecurity work. Through these building blocks, the NICE Framework enables organizations to develop their workforces, and helps learners to engage in appropriate learning activities to develop their knowledge and skills.

The purpose of this research is to evaluate if employers are using this framework by examining job descriptions found on online hiring platforms and measuring the extent of their alignment to the Framework. The results of this research will provide insight into whether or not actions need to be taken to increase industry awareness of the Framework or to modify the Framework to better apply to employer needs.

Two methodologies will be explored to complete this project. In the first methodology, job descriptions from multiple hiring platforms such as LinkedIn and USAJobs will be graded using a rubric to determine how well they align with the framework. A job description which matches a larger amount of key words found in the knowledge, skills and tasks of a work role will score higher on the rubric. In the second methodology matching keywords and qualifications will first be found between job descriptions. After compiling a list of the most common keywords and qualifications, this list will then be compared to the Framework work role to determine how well the Framework covers what employers desire.

Optimizing Data Communication for Low Latency Quantum Network Metrology


Quantum networks currently require various in-situ measurements from their components to ensure good network fidelity. Communications between quantum network nodes are carried out with single photons through the use of single-photon sources and single-photon detectors. One undesirable characteristic of these photon transmissions is substantial timing jitter associated with the single-photon detection process. To monitor this issue, each photon’s emission time and absorption time is recorded with picosecond accuracy and sent to the quantum network’s management system for analysis. This time-data transfer can become a considerable bottleneck in the network due to bandwidth limitations in classical data communication. Thus, we seek to reduce network overhead and optimally compress this data. In our investigation, we tested several lossless compression methods such as delta encoding, different types of variable length quantity encoding, and a hybrid approach on a sample of such data. We found that the hybrid approach produced the best results by compressing the data by 83.11% (a 5.92 compression ratio). Implementing this compression technique into quantum network metrology toolsets could significantly speed up quantum network analysis and allow for more data to be analyzed as well.

Understanding Neural Search Algorithms


Search engines have models for predicting if a document is relevant to a query. Furthermore, in search engines, deep learning methods for predicting relevance are an emerging area of research. To determine if a document is relevant or not based on a query, search engines may use three different models. The first model is manual (non-automatic) where there is human intervention to determine whether a document is relevant or not. The next two models are considered automatic in that the query is created from the textual description of the user information needed. The first automatic model is traditional where it looks at how often terms appear in documents and uses formulas to calculate its relevance. The second automatic model is neural where neural networks are used to determine the document’s relevance. The question then becomes how do all these three models compare with one another?

To answer this question, we use a query-by-query analysis approach by examining traditional, neural, and manual outputs on lots of search queries, then trying to identify patterns of success and failure for each model. I then conducted a qualitative analysis of traditional, neural, and manual ranking methods to understand the differences.

Created November 16, 2022