Skip to content

Roadmap

Philippe Ombredanne edited this page Oct 6, 2017 · 25 revisions

This is a high level list of what we are working on and what is completed.

Legend

completed 🕥 In progress Planned, not started

Work in progress

(see Completed features below)

Packages and dependencies

License detection

  • 🕥 support and detect license expressions (code in https://github.com/nexB/license-expression/ )
  • 🕥 support and detect composite licenses
  • ⬜ support custom licenses
  • ⬜ move licenses data set to external separate repository
  • ⬜ Improved unknown license detection
  • 🕥 sync with external sources (DejaCode, SPDX, etc.)

Copyrights

  • ⬜ speed up copyright detection
  • ⬜ improved detected lines range
  • ⬜ streamline grammar of copyright parser
  • ⬜ normalize holders and authors for summarization
  • ⬜ normalize and streamline results data format

Core features

  • ✅ pre scan filtering (ignore binaries, etc)
  • ✅ plugins! (worked as part of the GSoC by @yadsharaf )
  • 🕥 support Python 3 #295
  • 🕥 transparent archive extraction (as opposed to on-demand with extractcode)
  • 🕥 .scancode configuration file for exclusions, defaults, scan failure conditions, etc.
  • ⬜ support scan pipelines and rules to organize more complex scans
  • ⬜ scan baselining, delta scan and failure conditions (such as license change, etc)
  • ⬜ dedupe and similarities to avoid re-scanning. For now only identical files are scanned only once.
  • ⬜ Improved logging
  • 🕥 native support for ABC Data (See https://github.com/nexB/aboutcode/blob/master/aboutcode-data/README.rst )

Classification, summarization and deduction

  • 🕥 File classification #426
  • ⬜ summarize and aggregate data #377

Source code support

Compiled code support

Data exchange

  • ⬜ SPDX data conversion #338

Packaging

  • ⬜ simpler installation, automated installer
  • ⬜ distro-friendly packaging
  • ⬜ unbundle and package as multiple libaries (commoncode, extractcode, etc)

Documentation

  • ⬜ integration in a build/CI loop
  • ⬜ end to end guide to analyze a codebase
  • ⬜ hacking guides
  • ⬜ API doc when using ScanCode as a library

CI integration

  • ⬜ Plugins for CI (Jenkins, etc)
  • ⬜ Integration for CI (Travis, Appveyor, Drone, etc)

Other work in progress

Package mining and matching

(Note that this will be spawned in its project) Some code is in https://github.com/nexB/scancode-toolkit-contrib/

  • 🕥 exact matching
  • 🕥 attribute-based matching
  • 🕥 fuzzy matching
  • ⬜ peer-reviewed meta packages repo
  • ⬜ basic mining of package repositories

Other

  • ⬜ Crypto code detection

Completed features

Core scans

  • ✅ exact license detection
  • ✅ approximate license detection
  • ✅ copyright detection
  • ✅ file information (size, type, etc.)
  • ✅ URLs, emails, authors

Ouputs and UI

  • ✅ JSON compact and pretty
  • ✅ plain HTML tables, also usable in a spreadsheet
  • ✅ fancy HTML 'app' with a file tree navigation, and scan results filtering, search and sorting
  • ✅ improved scans GUI now its own project: https://github.com/nexB/aboutcode-manager
  • ✅ simple scan summary
  • ✅ SPDX output

Package and dependencies

  • ✅ common model for packages data
  • ✅ basic support for common packages format
  • ✅ RPM packages base
  • ✅ NuGet packages base
  • ✅ Python packages base
  • ✅ PHP Composer packages support with dependencies
  • ✅ Java Maven POM packages support with dependencies
  • ✅ npm packages support with dependencies

Speed!

  • ✅ accelerate license detection indexing and scanning; include caching
  • ✅ scan using multiple processes to speed up overall scan
  • ✅ cache per-file scan to disk and stream final results

Other

  • ✅ archive extraction with extractcode
  • ✅ conversion of scan results to CSV
  • ✅ improved error handling, verbose and diagnostic output
Clone this wiki locally