Introduction

Welcome to the Genera API! This API is used to access the Genera fuzzy hashing data and for all operations associated with the backend from the end user perspective.

For more information on Genera visit the Nebula Information System homepage.

Authentication

To set up authorization, use this code:

export MYGENERA=https://generahost44.mycompany.com

curl -X POST -H "Content-Type: application/json" -d @get_token.json $MYGENERA/api/token/

This will return a token in a JSON body: {"token":"9b393f580ecbfe957cf46dc22dbb5fc8ec9a4473"}

Make sure your get_token.json file has the format of: { "username" : "myusername", "password" : "mysecretpassword123" } The environment variable MYGENERA would be the hostname of your genera server, for example: https://generahost44.mycompany.com

You can validate a token via the API before use as well to make sure it has not expired or been revoked for some reason

curl -H "Authorization: Token $TOKEN" $MYGENERA/api/token-validation/

If the token is valid you will receive a JSON body indicating so along with the specific user ID the token is associated with: {"valid_token":true,"userid":"3"}

If the token is invalid you will receive an authentication error.

Genera uses an API token key to allow access to the API. Once you have been given an account by the administrator, you can request and begin using a token.

Genera expects the API key to be included in all API requests to the server in an authorization header that looks like the following:

Authorization: Token 9b393f580ecbfe957cf46dc22dbb5fc8ec9a4473

You must replace 9b393f580ecbfe957cf46dc22dbb5fc8ec9a4473 with your personal API key.

Corpora

Genera uses the concept of a corpus (the singular of corpora) to allow users to create and organize their scans during its fuzzy hashing operations. A corpus is identified by a UUID for API use and has a description for human readability as well.

When you specify a corpus during upload, the files extracted from the upload are only compared to files previously uploaded to the specified corpus. The uploaded files also then become part of that corpus. Thus for subsequent uploads the corpus continues to build upon itself. There is no limit to the number of corpora that may be created.

This allows any arbitrary organization of scanning required by the user to be implemented.

When you do not specify a corpus, a "default" corpus is used, internally this is represented by a NULL UUID: 00000000-0000-0000-0000-000000000000 but that string is not required durning API use. The absense of a corpus field causes the API to assume the default corpus.

Corpus Creation

export TOKEN=9b393f580ecbfe957cf46dc22dbb5fc8ec9a4473

curl -H "Content-Type: application/json" -H "Authorization: Token $TOKEN" -X POST \
  -d '{ "description" : "Case number 947573", "sample_size" : 150 }' $MYGENERA/api/corpus-create/

The JSON enumerating the description may also be kept in a file and used with the @filename.json convention as in token retrieval.

The above command returns JSON structured like this:

{
  "id":6,
  "datestamp":"2022-06-16T18:03:34.581112Z",
  "description":"Case number 947573",
  "uuid":"70cbeaa5-3df4-484e-8d23-ab39651e91f6",
  "visibility":0,
  "sample_size":150
}

The visibility field is for experimental use at the moment.

This endpoint creates a corpus for general use.

HTTP Request

POST http://host.your.com/api/corpus-create

Query Parameters

Parameter	Default	Required	Description
description	no default	True	The human text that describes the corpus.
sample_size	installation dependent	False	The default fuzzy hash sample size, the default is installation dependent.

The sample_size parameter sets the default sampling size in bytes for the fuzzy hashing algorithm to use in this corpus. A typical default size is 140. The larger the sample size, the faster the algorithm can process data but at the cost of accuracy. A very small sample size like 8 or 12 will achieve higher accuracy but at a greater compute cost and longer processing times.

Corpus List

export TOKEN=9b393f580ecbfe957cf46dc22dbb5fc8ec9a4473

curl -H "Authorization: Token $TOKEN" $MYGENERA/api/corpus-list/

This will produce a JSON body with the array of the corpora for the user tied to the token.

[
    {
        "id": 7,
        "datestamp": "2022-06-16T20:43:20.869365Z",
        "description": "Centos binaries",
        "uuid": "2e70cd7b-5356-4de1-aaac-3202b74eee2b",
        "visibility": 0,
        "sample_size": 140
    }
]

HTTP Request

GET http://host.your.com/api/corpus-list

File Intake

The intake file endpoint is the upload interface to push bundles of files into Genera for analysis.

Currently individual files and .tar or tar.gz files are supported, more upload types may be supported in the future as requested.

When a file is uploaded, an intake file object is create in the database. Specifying a corpus with the upload of the intake file makes it part of that corpus.

Intake File Creation and Enumeration

curl -F "file=@tctar1.tar" -F "corpus_uuid=2e70cd7b-5356-4de1-aaac-3202b74eee2b" \ 
 -H "Authorization: Token $TOKEN" $MYGENERA/api/intake-file-create/

or without specifying a corpus

curl -F "file=@tctar1.tar" -H "Authorization: Token $TOKEN" $MYGENERA/api/intake-file-create/

Upon success, this will yield a similar JSON body to:

{
  "id":874,
  "datestamp":"2022-06-17T00:41:35.834612Z",
  "filename":"tctar1.tar",
  "filetype":"POSIX tar archive (GNU)",
  "status":0,
  "scan_epoch":1655426495.834612,
  "corpus":2
}

To retrieve the intake file created with the above command for status purposes, use the detail endpoint and provide the ID from the creation step.

curl -H "Authorization: Token $TOKEN" $MYGENERA/api/intake-file-detail/874/

This will output the JSON body below with the latest status as indicated by the status field.

{
  "id":874,
  "datestamp":"2022-06-17T00:41:35.834612Z",
  "filename":"tctar1.tar",
  "filetype":"POSIX tar archive (GNU)",
  "status":5,
  "scan_epoch":1655426495.834612,
  "corpus":2
}

To get a listing of intake files in the default corpus use this command:

curl -H "Authorization: Token $TOKEN" $MYGENERA/api/intake-file-list/

This will produce an array as a JSON body with all intake files enumerated in the user's default corpus.

Use the "intake-file-list-all" variation to get a listing and enumeration of all intake files belonging to the user who holds that API token.

[
    {
        "id": 875,
        "datestamp": "2022-06-17T12:05:45.481936Z",
        "filename": "tctar6.tar",
        "filetype": "POSIX tar archive (GNU)",
        "status": 5,
        "scan_epoch": 1655467545.481936,
        "corpus": null
    },
    {
        "id": 876,
        "datestamp": "2022-06-17T12:06:22.255392Z",
        "filename": "tctar5.tar",
        "filetype": "POSIX tar archive (GNU)",
        "status": 5,
        "scan_epoch": 1655467582.255392,
        "corpus": null
    }
]

curl -H "Authorization: Token $TOKEN" $MYGENERA/api/intake-file-list/55/

By providing a corpus id at the end of the URL, only the intake files of that corpus will be listed.

HTTP Request

POST http://host.your.com/api/intake-file-create

HTTP Request

GET http://host.your.com/api/intake-file-list/<Corpus ID>

HTTP Request

GET http://host.your.com/api/intake-file-detail/<ID>

HTTP Request

GET http://host.your.com/api/intake-file-list/

HTTP Request

GET http://host.your.com/api/intake-file-list-all/

For any intake JSON record there are seven possible states as indicated by the "status" field, they are:

ERROR = -1
WAITING = 0
IN_PROGRESS = 1
FUZZY_COMPLETE = 2
PEERING = 3
RECONCILING = 4
RECONCILED = 5

When an intake file is created it is given the initial state of WAITING.

Once the backend discovers it in the waiting state it will be moved to IN_PROGRESS. After the initial unpacking and signature analysis of the file set that was uploaded is completed, the state will move to FUZZY_COMPLETE.

At this point the next stage will begin where the system will begin looking for peers in the corpus of previously analyzed files. It will reconcile the "distances" between files in the corpus and the new files which are being added to the corpus. Once reconciliation is complete, the scan of that particular intake file is complete and the state is moved to RECONCILED.

The endpoints used to retrieve the data about discovered relationship distances can then assume the state of those relationships is final.

Filetype Discovery Endpoints

One of the first operations performed after intake receives a tarball is that all of the files are inventoried and grouped together based on two parameters:

Linux file type (as provided by libmagic)
Sample sizes used on each specific type

The two parameters together are used to define uniqueness between all of the files recursively processed.

As mentioned previously, sample size may be adjusted in order to achieve a signature from a file. For an example, let's assume a tarfile was submitted containing some linux binaries.

As an example, there may be three distinct file types as discovered by libmagic during an intake scan, if a few of the files are deemed to be small and sampling size is adjusted, then the system will designate each variations in sample size as additional unique file types.

Typically the list endpoint is avoided because it can generate a lot of data. The detail endoint is preferred since it retrieves a single epoch's worth of filetypes. To use the detail endpoint, only the scan_epoch needs to be known. The scan_epoch can be found using the intake-file endpoints.

Filetype Discovery Enumeration

You would use this command to enumerate the all filetypes discovered listings for a user.

curl -H "Authorization: Token $TOKEN" $MYGENERA/api/filetypes-discovered-list/

Which would output a JSON body such as:

[
    {
        "uid": 4,
        "corpus": "00000000-0000-0000-0000-000000000000",
        "scan_epoch": 1655467545.481936,
        "scan_epoch_human": "2022-06-17T12:05:45.481936",
        "type_tuple_set": [
            {
                "sample_size": 140,
                "type_id64": "4677550038310303849",
                "file_type": "Mach-O 64-bit x86_64 executable, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|PIE>"
            }
        ]
    },
    {
        "uid": 4,
        "corpus": "00000000-0000-0000-0000-000000000000",
        "scan_epoch": 1655471891.949814,
        "scan_epoch_human": "2022-06-17T13:18:11.949814",
        "type_tuple_set": [
            {
                "sample_size": 140,
                "type_id64": "4677550038310303849",
                "file_type": "Mach-O 64-bit x86_64 executable, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|PIE>"
            }
        ]
    },
    {
        "uid": 4,
        "corpus": "01b71d29-f777-4fe4-b58b-c184e7dfb304",
        "scan_epoch": 1655476675.539139,
        "scan_epoch_human": "2022-06-17T14:37:55.539139",
        "type_tuple_set": [
            {
                "sample_size": 180,
                "type_id64": "4677550038310303849",
                "file_type": "Mach-O 64-bit x86_64 executable, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|PIE>"
            }
        ]
    },
    {
        "uid": 4,
        "corpus": "00000000-0000-0000-0000-000000000000",
        "scan_epoch": 1655478324.80587,
        "scan_epoch_human": "2022-06-17T15:05:24.805870",
        "type_tuple_set": [
            {
                "sample_size": 140,
                "type_id64": "4677550038310303849",
                "file_type": "Mach-O 64-bit x86_64 executable, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|PIE>"
            }
        ]
    }
]

The detail endpoint retieves a single specific list, this endpoint is preferred over the list for efficiency reasons.

curl -H "Authorization: Token $TOKEN" $MYGENERA/api/filetypes-discovered-detail/1654778121.080117/

Sample result:

[
    {
        "uid": 4,
        "corpus": "00000000-0000-0000-0000-000000000000",
        "scan_epoch": 1654778121.080117,
        "scan_epoch_human": "2022-06-09T12:35:21.080117",
        "type_tuple_set": [
            {
                "sample_size": 6,
                "type_id64": "-1779486992384273777",
                "file_type": "ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV)"
            },
            {
                "sample_size": 6,
                "type_id64": "-979060716482693493",
                "file_type": "ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV)"
            },
            {
                "sample_size": 6,
                "type_id64": "1448779365430216709",
                "file_type": "ELF 64-bit LSB executable, x86-64, version 1 (SYSV)"
            },
            {
                "sample_size": 140,
                "type_id64": "-1779486992384273777",
                "file_type": "ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV)"
            },
            {
                "sample_size": 140,
                "type_id64": "-979060716482693493",
                "file_type": "ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV)"
            },
            {
                "sample_size": 140,
                "type_id64": "1448779365430216709",
                "file_type": "ELF 64-bit LSB executable, x86-64, version 1 (SYSV)"
            }
        ]
    }
]

HTTP Request

GET http://host.your.com/api/filetypes-discovered-list/

HTTP Request

GET http://host.your.com/api/filetypes-discovered-detail/<epoch>

Metadata

Genera allows its users to create their own definitions and descriptions of the files present across their corpora via a simple upload.

This metadata is then incorporated automatically in API calls documented below so that correlation can be done by the user between what the user knows about a particular SHA256 and the matches produced during any subsequent scan.

The file format supported currently consists of a simple space delimited multiline text file which has the SHA256 of the file as the second field.

For example:

tar_out_centos_7 0x61cab33bf4a1317f9c65a0187dfc73dd7555b5edcbd5d939c06c228670605fb0 archive_0.tar 389-ds-base-1.3.10.2-6.el7.x86_64/usr/bin/pwdhash tar_out_centos_7 0x67e2c32023dedbce77424a135a5f5d9840aaa051db2c608ade949e39bd164943 archive_0.tar 389-ds-base-1.3.10.2-6.el7.x86_64/usr/bin/ldif

There is no limit to the number of fields present after the SHA256 field. Upon processing the file that is uploaded, the record stored by genera will consist of the first field followed by all the fields after the SHA256 delimited by a single space.

The first example line above would result in the following record in the keystore:

uid	sha256	meta
4	0x61cab33bf4a1317f9c65a0187dfc73dd7555b5edcbd5d939c06c228670605fb0	tar_out_centos_7 archive_0.tar 389-ds-base-1.3.10.2-6.el7.x86_64/usr/bin/pwdhash

In order to alter the metadata, simply upload new records and they will be overwritten.

Metadata Upload

There is a single endpoint for upload described in the section below. Note that once a metadata file is successfully uploaded, it will be batched processed. This means that the only way to determine if the batch processing is complete is to poll the system with a query looking for the last record (line) of the uploaded file. Typically processing of 100K lines takes less than a minute.

HTTP Request

POST http://host.your.com/api/metadata-file-create/

The metadata file upload is very similar to the intake file upload.

curl -F "file=@shalog_all.txt" -H "Authorization: Token $TOKEN" $MYGENERA/api/metadata-file-create/

The system will echo some information back to the user upon a successful upload. Currenly, the status field will always be zero. This will be fixed in a future release.

{
  "id":12,
  "datestamp":"2022-06-20T14:47:09.585897Z",
  "filename":"shalog_all.txt",
  "filetype":"ASCII text",
  "status":0
}

Metadata Retrieval

In order to determine if the processing is completed, query to find the record corresponding to the last line of the uploaded file.

For example if the last SHA256 of the batch being processed is: 0867d1c7243b03fed1c6399ebf9e2c1748f22801c27ac52e8deb113b62443daa

HTTP Request

GET http://host.your.com/api/file-metadata/<SHA256>

The above endpoint will retrieve the metadata for an individual SHA256.

curl -H "Authorization: Token $TOKEN" $MYGENERA/api/file-metadata/0x0867d1c7243b03fed1c6399ebf9e2c1748f22801c27ac52e8deb113b62443daa/

The SHA256 parameter may be prefixed by "0x" or simply be the hexadecimal string of the SHA256. Both are supported.

Scan History

The scan history endpoint retrieves the low level detail of the files with a specific type from a specific scan and contains an array of all SHA256 values which were processed. This array can be very large at times because it is the entire set of SHA256 values processed in the scan.

Scan History Detail

In order to perform a query for scan history detail, the filetypes discovered endpoint is leveraged to obtain filetype information.

HTTP Request

GET http://host.your.com/api/scan-history-detail/<sample_size>/<scan_epoch>/<type_id64>/

Query Parameters

Parameter	Default	Required	Description
sample_size	no default	True	The sample size from the filetypes discovered endpoint query
scan_epoch	no default	True	The scan epoch field value retrieved from the filetypes discovered endpoint query
type_id64	no default	True	The type_id64 field retrieved from the filetypes discovered endpoint query

From an example filetypes discovered endpoint query, if the JSON returned is:

[
    {
        "uid": 4,
        "corpus": "00000000-0000-0000-0000-000000000000",
        "scan_epoch": 1655821834.390254,
        "scan_epoch_human": "2022-06-21T14:30:34.390254",
        "type_tuple_set": [
            {
                "sample_size": 6,
                "type_id64": "-9021735175523411617",
                "file_type": "PDF document, version 1.3"
            },
            {
                "sample_size": 140,
                "type_id64": "-1779486992384273777",
                "file_type": "ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV)"
            },
            {
                "sample_size": 140,
                "type_id64": "4677550038310303849",
                "file_type": "Mach-O 64-bit x86_64 executable, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|PIE>"
            }
        ]
    }
]

From this data there are three potential scan history detail queries which will retrieve the histories. The potential query parameters would be as follows:

6, 1655821834.390254, -9021735175523411617

140, 1655821834.390254, -1779486992384273777

140, 1655821834.390254, 4677550038310303849

The query for the last of the three would look like this:

curl -H "Authorization: Token $TOKEN" $MYGENERA/api/scan-history-detail/140/1655821834.390254/4677550038310303849/

This would produce a JSON body such as:

[
    {
        "scan_set": [
            "1fddcbd58fcc6f0cb2dbceba1d4f5c03a8a9d7e5a12bfa76667f35eba0b710fc",
            "5447da38e862d623a6e0c6956b082f9da083d7dd9baedc7ff0632a1aebd71f5c",
            "c55a6bd1e9143432c5a85a5ea561ef281da336dde5ffd3ad08e0c47b1093f778"
        ],
        "uid": "4",
        "scan_epoch": 1655821834.390254,
        "scan_epoch_human": "2022-06-21T14:30:34.390254"
    }
]

What this ultimately indicates is that for the intake file submitted by the user with id 4 on 2022-06-21 at 14:30:34.390254, there were three files of type Mach-O 64-bit x86_64 and their SHA256 values are listed in the scan set array.

Peering Data

Peering data reflects the "distance" between files and also the distance between segments of those files (segments are also called chunks). A distance is represented by an integer value and has a range beginning with zero and having no upper bound defined. A distance of zero indicates that the similarity between the two segements or files being compared is extremely high. Note that during processing, comparison and scoring operations avoid comparison of a file to itself based on the SHA256 hash string which is held as a reference to every file as it is processed. Therefore, a distance of zero will never indicate that two files are exactly the same because a distance value always represents the comparison between of two unique SHA256 entities. However, it is possible for individual chunks of two distinct files to score a distance of zero because those chunks are in fact exact matches of each other byte for byte.

Entity Peer Data

When a scan is executed, there are two sets of peers generated as a result. The first result is the comparison of the batch of files which were unpacked from the tarball against each other. The second set is the result of comparing each new file signature to the signatures of corpus if one is specified. If no corpus was specified, the default corpus is the one used for the corpus wide comparison set. Scoring data for these two sets are retieved by the epoch-entity-peer-data and corpus-entity-peer-data endpoints respectively.

HTTP Request

GET http://host.your.com/api/epoch-entity-peer-data/<SHA256>/<epoch>/

GET http://host.your.com/api/corpus-entity-peer-data/<SHA256>/<epoch>/<corpus>

Retrieve the entity peer data for a given file by SHA256 and scan epoch. Epoch values are retrieved by using the file-intake-list or file-intake-detail endpoints.

curl -H "Authorization: Token $TOKEN" \
  $MYGENERA/api/epoch-entity-peer-data/c55a6bd1e9143432c5a85a5ea561ef281da336dde5ffd3ad08e0c47b1093f778/1656708168.487598/

Example JSON returned:

{
    "entity_peer_data": [
        {
            "corpus": "9b84e860-c59e-5afd-a03e-d91e8d02d0fd",
            "corpus_type": 0,
            "distance": 58,
            "peer_hint": -1,
            "peer_set": [
                {
                    "peer_index": -1,
                    "sha256": "5447da38e862d623a6e0c6956b082f9da083d7dd9baedc7ff0632a1aebd71f5c"
                }
            ],
            "sample_size": 140,
            "uid": 4,
            "via_variant": false
        },
        {
            "corpus": "9b84e860-c59e-5afd-a03e-d91e8d02d0fd",
            "corpus_type": 0,
            "distance": 1089,
            "peer_hint": -2,
            "peer_set": [
                {
                    "peer_index": -1,
                    "sha256": "1fddcbd58fcc6f0cb2dbceba1d4f5c03a8a9d7e5a12bfa76667f35eba0b710fc"
                }
            ],
            "sample_size": 140,
            "uid": 4,
            "via_variant": false
        }
    ]
}

Entity Peer Data Record Fields

The JSON returned from these two endpoints will be an array of records which have fields enumerating distance and rank along with some other data.

In the case of a GET using epoch-entity-peer-data endpoints the fields included are defined as:

corpus : In the case of the epoch-entity-peer-data endpoint, this is a version 5 UUID which is built from three fields, the user id and the epoch value along with the domain of the NULL UUID (00000000-0000-0000-0000-000000000000). In the case of the corpus-entity-peer-data, this will be the corpus specified in the request. Note that if a corpus is not specified (same format as the epoch endpoint) the default corpus will be assumed.
corpus_type : This is always 0 for an epoch entity peer.
distance : The distance value between the peer(file) specified in the request URL and the peer or peers listed in the peer_set.
peer_hint : This value is represents the rank in closeness. For entity peers, this is a negative number with -1 being the closest of all the peer distances listed, -2 being the second closest, etc.
peer_set : The peer_set is an array of two field values, the SHA256 hash of the matching file and a peer_index. The peer_index value is currently insignificant with entity peer records and can be ignored. It may be used in future versions of the API. This field is relevant in chunk peering records. It is included in these JSON bodies for parsing and data storage consistency. As implied by the name set, multiple SHA256 values indicate that multiple entities (files) were found to have the same similarity with regard to overall distance from the file specified in the request URL.
sample_size : This is sample size that was used in the comparison.
uid : The user id which initiated the scan and the "owner" of this resulting data.
via_variant : A boolean field indicating whether the distances were found using variant based processing.

Chunk Peer Data

Each comparison of individual files involves an algorithmic comparison of one or more "chunks" of those files depending on the size of the file and the complexity of its binary content. These chunks are not uniform in size, but they are linear from a standpoint that there is a first chunk followed by a second chunk followed by up to some fixed number of chunks. In other words, the chunks are based on the content of the binary and not fixed block size markers of the bytes of the file from beginning to end.

HTTP Request

GET http://host.your.com/api/epoch-chunk-peer-data/<SHA256>/<epoch>/

GET http://host.your.com/api/corpus-chunk-peer-data/<SHA256>/<epoch>/<corpus>

This command:

curl -H "Authorization: Token $NEWTOKEN" \
  $MYGENERA/api/epoch-chunk-peer-data/c55a6bd1e9143432c5a85a5ea561ef281da336dde5ffd3ad08e0c47b1093f778/1656949324.305865/

will retrieve chunk peer data for the hash c55a6bd1e9143432c5a85a5ea561ef281da336dde5ffd3ad08e0c47b1093f778 when it was processed during the scan initiated at timestamp 1656949324.305865.

The resulting output will have a JSON body such as:

{
    "chunk_peer_data": [
        {
            "corpus": "f96db40b-1b71-5d19-900a-4b0861cb758e",
            "corpus_type": 0,
            "distance": 8,
            "peer_hint": 3,
            "peer_set": [
                {
                    "peer_index": 3,
                    "sha256": "5447da38e862d623a6e0c6956b082f9da083d7dd9baedc7ff0632a1aebd71f5c"
                }
            ],
            "sample_size": 130,
            "uid": 4,
            "via_variant": false
        },
        {
            "corpus": "f96db40b-1b71-5d19-900a-4b0861cb758e",
            "corpus_type": 0,
            "distance": 25,
            "peer_hint": 2,
            "peer_set": [
                {
                    "peer_index": 2,
                    "sha256": "5447da38e862d623a6e0c6956b082f9da083d7dd9baedc7ff0632a1aebd71f5c"
                }
            ],
            "sample_size": 130,
            "uid": 4,
            "via_variant": false
        },
        {
            "corpus": "f96db40b-1b71-5d19-900a-4b0861cb758e",
            "corpus_type": 0,
            "distance": 24,
            "peer_hint": 1,
            "peer_set": [
                {
                    "peer_index": 1,
                    "sha256": "5447da38e862d623a6e0c6956b082f9da083d7dd9baedc7ff0632a1aebd71f5c"
                }
            ],
            "sample_size": 130,
            "uid": 4,
            "via_variant": false
        },
        {
            "corpus": "f96db40b-1b71-5d19-900a-4b0861cb758e",
            "corpus_type": 0,
            "distance": 25,
            "peer_hint": 0,
            "peer_set": [
                {
                    "peer_index": 0,
                    "sha256": "5447da38e862d623a6e0c6956b082f9da083d7dd9baedc7ff0632a1aebd71f5c"
                }
            ],
            "sample_size": 130,
            "uid": 4,
            "via_variant": false
        }
    ]
}

Detailed Distance Data

The Genera API allows you to retrieve detailed distance data based on any pair of hashes that have been processed during scanning. This is useful in cases where there is a need to simple treat the data as a searchable data lake and also to gain a deeper understanding of section chunk oriented output.

HTTP Requests

GET http://host.your.com/api/distance-by-sha256/<SHA256>/<OtherSHA256>/<SampleSize>/

With a specific sample size specified:

curl -H "Authorization: Token $TOKEN" \
  $MYGENERA/api/distance-by-sha256/5447da38e862d623a6e0c6956b082f9da083d7dd9baedc7ff0632a1aebd71f5c/c55a6bd1e9143432c5a85a5ea561ef281da336dde5ffd3ad08e0c47b1093f778/130/

Without a specific sample size as part of the URL:

curl -H "Authorization: Token $TOKEN" \
  $MYGENERA/api/distance-by-sha256/5447da38e862d623a6e0c6956b082f9da083d7dd9baedc7ff0632a1aebd71f5c/c55a6bd1e9143432c5a85a5ea561ef281da336dde5ffd3ad08e0c47b1093f778/

The latter will generate a list of records:

{
    "distance_by_sha256": [
        {
            "chunk_section_distance_set": [
                {
                    "distance": 28,
                    "index": 0,
                    "other_index": 0
                },
                {
                    "distance": 843,
                    "index": 0,
                    "other_index": 1
                },
                {
                    "distance": 866,
                    "index": 0,
                    "other_index": 2
                },
                {
                    "distance": 735,
                    "index": 0,
                    "other_index": 3
                },
                {
                    "distance": 843,
                    "index": 1,
                    "other_index": 0
                },
                {
                    "distance": 12,
                    "index": 1,
                    "other_index": 1
                },
                {
                    "distance": 907,
                    "index": 1,
                    "other_index": 2
                },
                {
                    "distance": 830,
                    "index": 1,
                    "other_index": 3
                },
                {
                    "distance": 864,
                    "index": 2,
                    "other_index": 0
                },
                {
                    "distance": 915,
                    "index": 2,
                    "other_index": 1
                },
                {
                    "distance": 10,
                    "index": 2,
                    "other_index": 2
                },
                {
                    "distance": 769,
                    "index": 2,
                    "other_index": 3
                },
                {
                    "distance": 733,
                    "index": 3,
                    "other_index": 0
                },
                {
                    "distance": 830,
                    "index": 3,
                    "other_index": 1
                },
                {
                    "distance": 775,
                    "index": 3,
                    "other_index": 2
                },
                {
                    "distance": 8,
                    "index": 3,
                    "other_index": 3
                }
            ],
            "entity_distance": 58,
            "other_sha256": "c55a6bd1e9143432c5a85a5ea561ef281da336dde5ffd3ad08e0c47b1093f778",
            "sample_size": 140,
            "sha256": "5447da38e862d623a6e0c6956b082f9da083d7dd9baedc7ff0632a1aebd71f5c",
            "uid": 4
        },
        {
            "chunk_section_distance_set": [
                {
                    "distance": 25,
                    "index": 0,
                    "other_index": 0
                },
                {
                    "distance": 781,
                    "index": 0,
                    "other_index": 1
                },
                {
                    "distance": 830,
                    "index": 0,
                    "other_index": 2
                },
                {
                    "distance": 777,
                    "index": 0,
                    "other_index": 3
                },
                {
                    "distance": 792,
                    "index": 1,
                    "other_index": 0
                },
                {
                    "distance": 24,
                    "index": 1,
                    "other_index": 1
                },
                {
                    "distance": 849,
                    "index": 1,
                    "other_index": 2
                },
                {
                    "distance": 820,
                    "index": 1,
                    "other_index": 3
                },
                {
                    "distance": 826,
                    "index": 2,
                    "other_index": 0
                },
                {
                    "distance": 864,
                    "index": 2,
                    "other_index": 1
                },
                {
                    "distance": 25,
                    "index": 2,
                    "other_index": 2
                },
                {
                    "distance": 828,
                    "index": 2,
                    "other_index": 3
                },
                {
                    "distance": 758,
                    "index": 3,
                    "other_index": 0
                },
                {
                    "distance": 830,
                    "index": 3,
                    "other_index": 1
                },
                {
                    "distance": 835,
                    "index": 3,
                    "other_index": 2
                },
                {
                    "distance": 8,
                    "index": 3,
                    "other_index": 3
                }
            ],
            "entity_distance": 82,
            "other_sha256": "c55a6bd1e9143432c5a85a5ea561ef281da336dde5ffd3ad08e0c47b1093f778",
            "sample_size": 130,
            "sha256": "5447da38e862d623a6e0c6956b082f9da083d7dd9baedc7ff0632a1aebd71f5c",
            "uid": 4
        }
    ]
}

Errors

The Kittn API uses the following error codes:

Error Code	Meaning
400	Bad Request -- Your request is invalid.
401	Unauthorized -- Your API key is wrong.
204	No Content (similar to Not Found -- But the specified resource could not be found.)
406	Not Acceptable -- You requested a format that isn't json.
500	Internal Server Error -- We had a problem with our server. Try again later.
503	Service Unavailable -- We're temporarily offline for maintenance. Please try again later.