Skip to main content

Security best practices: Data integrity and authenticity

Advanced
Security
Concept

Certified variables

Security concern

ICP offers three modes of operations for canisters, update, query and composite_query. For the sake of simplicity, we will club composite_query under queries for the rest of this section. For more information, view the detailed overview between update and query calls. As explained in the overview, update calls are slow and expensive but provide integrity guarantees as their response include a threshold signature signed by the subnet.

On the other hand, query calls are fast since a single replica formulates the response but there is no integrity guarantee, since the response can be manipulated by a single replica or boundary node. For example, if the NNS dapp fetches proposal information from the governance canister via query calls and the responding node is malicious, it can mask an ill intentioned proposal that causes irrevocable damage as innocuous by modifying the proposal payload in the response and mislead voters into voting yes. Another consequence of query calls is that users can't rely on canister_inspect_message as a guard. This makes query calls, in it's raw form, unfit to serve data for security critical applications.

Using certified variables for secure queries

In certain use cases, there is a third option whereby query results can return data that has been certified by the subnet in an earlier update call. This is the concept of certified data and it requires changes to the update call to create the certification, query call to return the certificate and front-end to verify the certificate. Using certified data provides the best of both worlds with query-like response times and update-like certified responses. This forms the core of certified variables.

Some examples of certified variables are asset certification in Internet Identity, NNS dapp, or the canister signature implementation in Internet Identity.

Certified variables is an advanced feature which require careful implementation of authenticated data structures and verification on the canister and client side respectively. If the client doesn't require fast response times, call the query method as an update call (replicated query). The response would be certified by the subnet and a single malicious or boundary node can't modify the response.

ICP also provides replica signed queries, where query responses are signed by the answering replica node, however it doesn't have the same security guarantees an update call and only protects from malicious boundary nodes. Replica signed queries are enabled by default on both agent-rs and agent-js. Read more about the feature.

What is certified data?

Aside update calls, the subnet certifies (creates a threshold signature) a part of the canister data every round. This is stored in the state tree under the label certified_data. However, since it's certified every round, the amount of data that can be stored in certified_data is limited to 32 bytes. Hence, when you modify the state of your canister during an update call, if you can convert the state into a unique representation that can fit into 32 bytes, you can store it under certified_data and it will be certified. Naturally, this can be done by computing a hash of the data structure of the canister state. This is also why certified variables is difficult to implement. Depending on your data structures, you will need to develop a different kind of hashing function.

Subsequent query calls can return the data as-is, including the signature on the certified_data, which the front-end can verify with the IC root public key. This means that data aggregation or other calculations can't be done in query calls as there would be no way to produce a signature over that newly created data. There are two workarounds, either this data is precomputed in the update call or all raw data is sent to the front-end which verifies it and does the calculations. Combining these features, a canister should be able to certify a variable in a query response with this design.

On a high level, in your canister,

  1. Choose an authenticated data structure like Merkle trees to store a value in canister memory.
  2. In the update call,
  • Perform the computation and store the result in the Merkle tree.
  • The lookup path for the result must act as it's key. Ideally this key should be the parameters provided by the caller in the query method.
  • Recompute the Merkle proof (root_hash)
  • Store the root_hash as the canister's certified data.
  • Return the key as response.
  1. In the query call,
  • Fetch the result from the Merkle structure using the query parameters as the lookup path.
  • Fetch the current certified_data for the canister.
  • Compute the witness for the result using the same lookup path. The Merkle witness provides proof of inclusion that the requested result exists in the Merkle tree under the given path.
  • Return (result, certified_data, witness) as the response.

The rest of the section shows an example canister, which can serve a certified response for a query using certified_data which is verified in the frond-end. The examples are written in Rust and Motoko but the overall design can be implemented in other languages.

Building a canister with certified variables

Let's consider the following canister interface:

type User = record {
name: text;
age: nat8;
};

type CertifiedUser = record {
user : User;
certificate : blob;
witness : blob;
};

service : {
"set_user": (User) -> (nat64);
"get_user": (nat64) -> (CertifiedUser) query;
}

The canister exposes the following service:

  • set_user: The caller provides a User object to the canister. The canister records it and serves a corresponding index for the entry as the response. Since certified_data can only store 32 bytes of data, it uses a specialized data structure from ic_certified_map to store the User data.
    • The data structure internally stores the data in a HashTree (or Merkle tree) and records the root_hash of the data structure in the certified_data, which is 32 bytes.
    • The root_hash cryptographically guarantees that only one tree can correspond to that hash. The root_hash is also referred as the Merkle proof.
  • get_user: The caller provides a index: nat64 to the canister and gets a certified response for the corresponding User. The CertifiedUser response must have the following structure for verifying the response:
    • user: The actual response.
    • certificate: The payload for verifying the signature on the certified_data. ICP provides the system API data_certificate() for this.
    • witness: Allows for the final verification of the response to be completed with the requested input and certified_data.

You can find an example implementation of the canister below.

use candid::CandidType;
use ic_certified_map::HashTree;
use ic_certified_map::{leaf_hash, AsHashTree, Hash, RbTree};
use serde::{Deserialize, Serialize};
use std::borrow::Cow;
use std::cell::Cell;
use std::cell::RefCell;

#[derive(CandidType, Serialize, Deserialize, Clone)]
struct User {
name: String,
age: u8,
}

impl AsHashTree for User {
fn root_hash(&self) -> Hash {
let user_serialized = serde_cbor::to_vec(&self).unwrap();
leaf_hash(&user_serialized[..])
}
fn as_hash_tree(&self) -> HashTree<'_> {
HashTree::Leaf(Cow::from(serde_cbor::to_vec(&self).unwrap()))
}
}

#[derive(CandidType)]
struct CertifiedUser {
user: User,
certificate: Vec<u8>,
witness: Vec<u8>,
}

thread_local! {
static INDEX : Cell<u64> = Cell::new(0);
static TREE: RefCell<RbTree<&'static str, RbTree<[u8; 8], User>>> = RefCell::new(RbTree::new());
}

#[ic_cdk::update]
fn set_user(user: User) -> u64 {
let index = INDEX.with(|index| {
let count = index.get() + 1;
index.set(count);
count
});

TREE.with_borrow_mut(|tree| {
match tree.get(b"user") {
Some(_) => {
tree.modify(b"user", |inner| {
inner.insert(index.to_be_bytes(), user);
});
}
None => {
let mut inner = RbTree::new();
inner.insert(index.to_be_bytes(), user);
tree.insert("user", inner);
}
}
ic_cdk::api::set_certified_data(&tree.root_hash());
});
index
}

#[ic_cdk::query]
fn get_user(index: u64) -> CertifiedUser {
let certificate = ic_cdk::api::data_certificate().expect("No data certificate available");

TREE.with_borrow(|tree| {
let user = match tree.get(b"user") {
Some(inner) => {
let user = inner.get(&index.to_be_bytes()[..]).expect("User not found");
user.to_owned()
}
None => {
panic!("Tree isn't initialized");
}
};

let mut witness = vec![];
let mut witness_serializer = serde_cbor::Serializer::new(&mut witness);
let _ = witness_serializer.self_describe();
tree.nested_witness(b"user", |inner| inner.witness(&index.to_be_bytes()[..]))
.serialize(&mut witness_serializer)
.unwrap();

CertifiedUser {
user,
certificate,
witness,
}
})
}

Verifying certified variables

Once you have the response CertifiedUser, for the integrity guarantee, the front-end must verify the certification in the response. This is broken down into several steps implemented in the Rust and JavaScript example below.

The example has some extra steps to setup the canister with some User data before verification. You can ignore the section marked between // ==== START of canister data setup and // ==== END of canister data setup

  1. Verify the IC certificate: Recompute the root_hash of certificate.tree (pruned state tree with the canister's certified_data) and verify the certificate.signature with root_hash as the message,certificate.delegation, and the IC root_key as the public key. This confirms that the signature is valid for the current state tree.
  2. Validate that the response is not stale by verifying the time at /time in certificate.tree is less than a certain delta of current time. The recommended delta is 5 minutes but should be adapted to the use case.
  3. Recompute the root_hash of the witness and verify equality with the certified_data. The certified_data can be obtained from certificate.tree under the path /canister/<canister_id>/certified_data.
  4. Check if query parameters are in the witness. In this example, the lookup path is /user/<index> and should be present in the witness.
  5. Validate if the value found in /user/<index> matches user from the response.
  6. If all of the previous steps succeed, return user as the valid response.
use arbitrary::{Arbitrary, Unstructured};
use candid::Encode;
use candid::Principal;
use candid::{CandidType, Decode, Deserialize};
use futures::future::join_all;
use ic_agent::identity::AnonymousIdentity;
use ic_agent::Agent;
use ic_certificate_verification::validate_certificate_time;
use ic_certificate_verification::VerifyCertificate;
use ic_certification::hash_tree::HashTree;
use ic_certification::{Certificate, LookupResult};
use rand::prelude::*;
use serde_cbor::Deserializer;
use std::time::{SystemTime, UNIX_EPOCH};

#[derive(CandidType, Deserialize, Debug, PartialEq, Eq, Arbitrary)]
struct User {
name: String,
age: u8,
}

#[derive(CandidType, Deserialize)]
struct CertifiedUser {
user: User,
certificate: Vec<u8>,
witness: Vec<u8>,
}

static URL: &str = "http://localhost:41749";
static CANISTER: &str = "a3shf-5eaaa-aaaaa-qaafa-cai";
const MAX_CERT_TIME_OFFSET_NS: u128 = 300_000_000_000; // 5 min
const MAX_CALLS: usize = 10;

#[tokio::main]
async fn main() {

let agent = Agent::builder()
.with_url(URL)
.with_identity(AnonymousIdentity)
.build()
.expect("Unable to create agent");

// This should be done only in demo environments.
// When interacting with mainnet, hardcode the root_key.
agent
.fetch_root_key()
.await
.expect("Unable to fetch root key");
let root_key = agent.read_root_key();

let canister_id = Principal::from_text(CANISTER).unwrap();

// ==== START of canister data setup
let mut rng = rand::thread_rng();

// Make MAX_CALLS to set_user
let mut get_user_calls = Vec::new();
for _ in 0..MAX_CALLS {
let bytes: [u8; 16] = rng.gen();
let mut u = Unstructured::new(&bytes[..]);
let temp_user = User::arbitrary(&mut u).unwrap();

println!("Calling set_user with {:?}", temp_user);
let response = agent
.update(&canister_id, "set_user")
.with_effective_canister_id(canister_id)
.with_arg(Encode!(&temp_user).unwrap())
.call_and_wait();
get_user_calls.push(response);
}
let results: Vec<u64> = join_all(get_user_calls)
.await
.into_iter()
.map(|result| {
Decode!(
result
.expect("Query call get_user failed")
.as_slice(),
u64
)
.unwrap()
})
.collect();

// From response indexes, choose a random index for get_user
let index: usize = rng.gen();
let index: u64 = *results.get(index % MAX_CALLS).unwrap();
// ==== END of canister data setup

println!("Fetching index {:?}", index);

let query_response = agent
.query(&canister_id, "get_user")
.with_effective_canister_id(canister_id)
.with_arg(Encode!(&index).unwrap())
.call()
.await
.expect("Unable to call query call get_user");

let certified_user = Decode!(&query_response, CertifiedUser).unwrap();

let mut deserializer = Deserializer::from_slice(&certified_user.certificate);
let certificate: Certificate = serde::de::Deserialize::deserialize(&mut deserializer).unwrap();

let start = SystemTime::now();
let current_time = start
.duration_since(UNIX_EPOCH)
.expect("Time went backwards")
.as_nanos();

// Step 1: Check if signature in the certificate can be validated with the
// root_hash of the tree in certificate as message and root_key as public_key
let verification_result = certificate.verify(canister_id.as_slice(), &root_key[..]);

println!(
"Step 1: Digest match & Signature verification: {:?}",
verification_result
);

// Step 2: Check if the response is not stale with the given time offset MAX_CERT_TIME_OFFSET_NS.
let time_verification_result =
validate_certificate_time(&certificate, &current_time, &MAX_CERT_TIME_OFFSET_NS);

println!("Step 2: Time skew: {:?}", time_verification_result);

// Step 3: Check if witness root_hash matches the certified_data
let lookup_result =
certificate
.tree
.lookup_path([b"canister", canister_id.as_slice(), b"certified_data"]);

let certified_data: [u8; 32] = match lookup_result {
LookupResult::Found(result) => result.try_into().unwrap(),
_ => panic!("Certified data not found"),
};

let mut deserializer = Deserializer::from_slice(&certified_user.witness);
let witness_decoded: HashTree<Vec<u8>> =
serde::de::Deserialize::deserialize(&mut deserializer).unwrap();
let witness_digest = witness_decoded.digest();

println!(
"Step 3: Witness digest matches certified data: {:?} ",
witness_digest == certified_data
);

// Step 4: Check if the query parameters are in the witness
let witness_lookup: User =
match witness_decoded.lookup_path([b"user", &index.to_be_bytes()[..]]) {
LookupResult::Found(result) => serde_cbor::from_slice(result).unwrap(),
_ => panic!("user {} not found", index),
};

// Step 5: Check if the data found in Witness matches the returned result from the query.
println!(
"Step 4 & Step 5: Witness data matches User value: {:?}",
witness_lookup == certified_user.user
);

// Step 6: Return the result
println!("Result: {:?}", certified_user.user);
}

Use HTTP asset certification and avoid serving your dapp through raw.icp0.io

Security concern

Dapps on ICP can use asset certification to make sure the HTTP assets delivered to the browser are authentic (i.e. threshold-signed by the subnet). If an app does not do asset certification, it can only be served insecurely through raw.icp0.io , where no asset certification is checked. This is insecure since a single malicious node or boundary node can freely modify the assets delivered to the browser.

If an app is served through raw.icp0.io in addition to icp0.io, an adversary may trick users (phishing) into using the insecure raw.icp0.io.

Recommendation

  • Only serve assets through <canister-id>.icp0.io where the boundary nodes enforce response verification on the served assets. Do not serve through <canister-id>.raw.icp0.io.

  • Serve assets using the asset canister, which creates asset certification automatically, or add the ic-certificate header including the asset certification as e.g. done in the NNS dapp and Internet Identity.

  • Check in the canister’s http_request method if the request came through raw. If so, return an error and do not serve any assets.