marfcat submission Tomcat 5.5.13   v. SATE.5
============================================

These select reports are about apache-tomcat-5.5.13 using a small subset
of algorithms.

CVE-based training and reporting:

There are line numbers that were machine-learned from the _train.xml file
as well as the types of locations and descriptions provided by the SATE
organizers and encorporated into the reports via machine learnining.
This includes the types of locations, such as "fix", "sink", or "path"
learned from the provided XML/spreadsheet/source code files.

Two of the submitted report-*.xml files are the best ones. Their macro precision
rate using machine learning techniques is 83.72%. The stats-*.txt files are there
summarizing the evaluation precision. The results are as good as the training data
given; if there are mistakes in the data selection and annotation XML files, then
the results will also have mistakes accordingly.

The best reports are:

  report-noprepreprawfftcheb-apache-tomcat-5.5.13-train-cve.xml
  report-noprepreprawfftdiff-apache-tomcat-5.5.13-train-cve.xml (does not validate three tool-specific lines)

Other reports are, to a various degree of detail and noise:

  report-noprepreprawfftcos-apache-tomcat-5.5.13-train-cve.xml (does not validate two lines)
  report-noprepreprawffteucl-apache-tomcat-5.5.13-train-cve.xml (does notvalidate three tool-specific lines)
  report-noprepreprawffthamming-apache-tomcat-5.5.13-train-cve.xml
  report-noprepreprawfftmink-apache-tomcat-5.5.13-train-cve.xml
  report-nopreprepcharunigramadddelta-apache-tomcat-5.5.13-train-cve-nlp.xml

The -nlp version reports use the NLP techniques with the machine
learning instead of signal processing techniques. Those reports
are largerly comparable, but have smaller recall (still a bug?),
i.e. some CVEs are completely missing out from the reports :(

Some reports have problems with tool-specific ranks like:
4.199735736674989E-4; I will have to see how to reduce these.


CWE-based training and reporting:

The CWE-based reports use the CWE as a primary class instead of CVE
for training and reporting, and as such currently do not report on
CVEs directly; however, their recognition rates are not very low
either in the same spots, types, etc. In the future version of
the tool the plan is to combine two machine learning pipeline
runs of CVE and CWE together to improve mutual classification,
but right now it is not available. The CWE-based training is also
used on the testing files say of Pebble to see if there are any
similar weaknesses to that of Tomcat found, e.g. in Pebble. CWEs,
unlike CVEs for most project, represent better cross-project
classes as they are largely project-independent. Both CVE-based
and CWE-base methods use the same data for training. CWEs are
recognized correctly 81.82% for Tomcat. NLP-based CWE testing
is not included as its precision was quite low (39%).

The best reports are:

  report-cweidnoprepreprawfftcheb-apache-tomcat-5.5.13-train-cwe.xml (does not validate)
  report-cweidnoprepreprawfftdiff-apache-tomcat-5.5.13-train-cwe.xml (does not validate)

Other reports are, to a various degree of detail and noise:

  report-cweidnoprepreprawfftcos-apache-tomcat-5.5.13-train-cwe.xml
  report-cweidnoprepreprawffteucl-apache-tomcat-5.5.13-train-cwe.xml (does not validate)
  report-cweidnoprepreprawffthamming-apache-tomcat-5.5.13-train-cwe.xml
  report-cweidnoprepreprawfftmink-apache-tomcat-5.5.13-train-cwe.xml


Log files
---------

The corresponding *.log files are there for references, but contain A LOT
of debug info from the tool. The tool is using thresholding to
reduce the amount of noise going into the reports, but if you
are curious to examine the logs, they are included.

Files:
------

apache-tomcat-5.5.13-src_train.xml (meta training file)
marfcat-cweid-nopreprep-raw-fft-cheb.log
marfcat-cweid-nopreprep-raw-fft-cos.log
marfcat-cweid-nopreprep-raw-fft-diff.log
marfcat-cweid-nopreprep-raw-fft-eucl.log
marfcat-cweid-nopreprep-raw-fft-hamming.log
marfcat-cweid-nopreprep-raw-fft-mink.log
marfcat-nopreprep-char-unigram-add-delta.log
marfcat-nopreprep-raw-fft-cheb.log
marfcat-nopreprep-raw-fft-cos.log
marfcat-nopreprep-raw-fft-diff.log
marfcat-nopreprep-raw-fft-eucl.log
marfcat-nopreprep-raw-fft-hamming.log
marfcat-nopreprep-raw-fft-mink.log
marfcat--super-fast-tomcat-train-cve.log
marfcat--super-fast-tomcat-train-cve-nlp.log
marfcat--super-fast-tomcat-train-cwe.log
README.txt
report-cweidnoprepreprawfftcheb-apache-tomcat-5.5.13-train-cwe.xml
report-cweidnoprepreprawfftcos-apache-tomcat-5.5.13-train-cwe.xml
report-cweidnoprepreprawfftdiff-apache-tomcat-5.5.13-train-cwe.xml
report-cweidnoprepreprawffteucl-apache-tomcat-5.5.13-train-cwe.xml
report-cweidnoprepreprawffthamming-apache-tomcat-5.5.13-train-cwe.xml
report-cweidnoprepreprawfftmink-apache-tomcat-5.5.13-train-cwe.xml
report-nopreprepcharunigramadddelta-apache-tomcat-5.5.13-train-cve-nlp.xml
report-noprepreprawfftcheb-apache-tomcat-5.5.13-train-cve.xml
report-noprepreprawfftcos-apache-tomcat-5.5.13-train-cve.xml
report-noprepreprawfftdiff-apache-tomcat-5.5.13-train-cve.xml
report-noprepreprawffteucl-apache-tomcat-5.5.13-train-cve.xml
report-noprepreprawffthamming-apache-tomcat-5.5.13-train-cve.xml
report-noprepreprawfftmink-apache-tomcat-5.5.13-train-cve.xml
stats-per-cve-nlp.txt
stats-per-cve.txt
stats-per-cwe.txt


--
Serguei A. Mokhov
mokhov@cse.concordia.ca
