[sysrepo-devel] Confirmed Commit Capability (RFC 6241 8.4)

ADAM WEAST ADAM.WEAST at adtran.com
Tue May 8 17:59:13 UTC 2018
Next message (by thread): [sysrepo-devel] Confirmed Commit Capability (RFC 6241 8.4)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mikael / Michal / Radek,

Thank you guys for your feedback.  We have thought about this and have come up with a netopeer2 based approach.  Please let us know what you think.

To implement confirmed commit, we would need to save some state data, possibly with a new struct in common.h. This struct would hold some basic information:

* bool persist - Indicates that this is a persistent confirmed commit or one tied to the life of a session
* nc_session* session - Points to the session that owns the commit if this is not a persistent commit
* char* persist_id - Holds the persistent id if this is a persistent commit
* possibly a mutex for protecting the struct?

A pointer for this struct will be added to the np2srv struct to allow global access. Only one confirmed commit can be active at a time, so it will be sufficient to have a single pointer. A null pointer would indicate that there is no active confirmed commit.

Below are the proposed changes to existing netopeer2 operations:
op_commit
    if the <confimed> tag is present:
        if there is already a confirmed commit present (e.g. np2srv->confirmed is not null):
            if this is not the correct session or the wrong persist_id is used:
                return with an error
            else:
                perform the normal sr_copy_config
                extend the timer
        else:
            get high-level access to sysrepo (discussion below)
            use similar logic to op_getconfig to get a libyang node of the current running datastore
            use libyang to write the datastore to disk
            create the confirmed commit struct with the necessary information and save to np2srv
            restore normal access permissions for sysrepo
            perform the normal sr_copy_config
            create and set the timeout timer
    else (normal or confirming commit):
        if there is already a confirmed commit present:
            if this is not the correct session or the wrong persist_id is used:
                return with an error
            else:
                perform the normal sr_copy_config
                delete previous datastore data from the persistent storage
                delete confirmed commit struct from np2srv
                clean up any timer data
        else:
            perform the normal sr_copy_config

op_cancelcommit (also used for non-persistent session disconnects, timer timeouts, and sessions killed with kill-session)
    if there is a confirmed commit present:
        if this is not the correct session or the wrong persist_id is used:
            return an error
        else:
            get high-level access to sysrepo (discussion below)
            use similar logic to op_getconfig to get a libyang node of the current running datastore
            use libyang to load the previous datastore state from persistent storage
            use lyd_diff to find the difference between previous and current datastore state
            use logic similar to op_editconfig to commit previous state to running
            restore normal access permissions
            delete previous datastore data from the persistent storage
            delete confirmed commit struct from np2srv
            clean up any timer data
    else:
        return an error

netopeer2 startup
    if there is a previous running datastore on persistent storage:
        // assume netopeer crashed in the middle of a confirmed commit
        // we need to revert it
        create temporary session to sysrepo with NACM disabled
        use similar logic to op_getconfig to get a libyang node of the current running datastore
        use libyang to load the previous datastore state from persistent storage
        use lyd_diff to find the difference between previous and current datastore state
        use logic similar to op_editconfig to commit previous state to running
        close temporary session

op_lock
When a user attempts to lock the running datastore, we will add an additional check to ensure there is not an active confirmed commit before granting the lock.

Caveats:

To save a backup of the running datastore and revert it successfully, we need a way to access all datastore nodes regardless of the permission level of the current user. This corresponds with the sections of the psuedo-code above where we 'get high-level access to sysrepo'. There are specific use cases we can see that would require this. Assume low-privilege user A starts a persistent confirmed commit. High-privilege user B could add a follow-up confirmed commit with changes to data that A would normally not have access to. If A issues a cancel-commit, we would expect the datastore to return to its previous state, including any hidden changes B made. To do this, netopeer2 needs full read and write access to the entire running datastore regardless of the privilege level of the user issuing the initial confirmed commit or cancel commit commands. We have a couple of ideas, but we would like your input regarding what you believe would be the best way.
* use sr_session_set_options to disable NACM on the current session. This may not be sufficient however due to the file ACLs on sysrepo's repository files.
* create a temporary root (or highest-privilege possible) session into sysrepo. This should give us access to all modules regardless of who installed them into sysrepo.
Beyond these ideas, we cannot see a way to backup and restore all data to and from sysrepo without adding new Client Library calls to sysrepo itself.

We also want to point out that we can only correctly manage locks on the running datastore if netopeer2 is the only client connected to sysrepo. According to the NETCONF RFC, users are not required to lock the datastore before starting a confirmed commit. However, if the datastore is not locked during an active confirmed commit, users cannot get a new lock on the running datastore until the commit is confirmed or cancelled. All of this logic would be managed within netopeer2 in this proposal. If a second, non-netopeer2 client connected to sysrepo, it could place a sysrepo lock on the running datastore during a netopeer2 confirmed commit. This would leave netopeer2 in a bad state if it attempted to cancel the ongoing confirmed commit. We personally do not foresee a need to have a second client connected to sysrepo, but we wanted to make sure you were aware of this possible limitation going forward.
In Michal's last response, it was suggested to implicitly lock the running datastore at the beginning of every confirmed commit. We were afraid this might interfere with confirmed commits with persist-ids. In this situation, we would expect a second netopeer2 session to be allowed to perform commits on the running datastore, regardless of which netopeer2 session started the confirmed commit. During a persistent confirmed commit, the original user may also disconnect, which would normally drop any sysrepo locks associated with the session. Also, based on the snippet of the NETCONF RFC below, we thought that any user could still issue a normal edit-config if the datastore is unlocked, but those changes have the potential to be removed or altered during a cancel commit. We understood this to be a strong suggestion for the user of the system to place a lock before starting a confirming commit, not as a requirement of the implementation to do it automatically. Please let us know if you feel differently.

Snippet from RFC 6241 section 8.4.1:
For shared configurations, this feature can cause other configuration changes (for example, via other NETCONF sessions) to be inadvertently altered or removed, unless the configuration locking feature is used (in other words, the lock is obtained before the <edit-config> operation is started).  Therefore, it is strongly suggested that in order to use this feature with shared configuration datastores, configuration locking SHOULD also be used.

We would still like to discuss further the use of the unmodified sr_commit to revert the data. In the unlikely case that a subscriber rejects the previous running datastore state, how should we expect netopeer2 to react? For confirmed commits with a large number of changes, it seems potentially dangerous that a single subscriber could block all changes. If the sr_commit was unsuccessful, how should netopeer2 behave? Right now, we only see two main options.
* return rpc-error in confirmed-commit response, but hold on to the internal confirmed commit context so the user can try again
* return rpc-error in confirmed-commit response, and throw away internal confirmed commit context (essentially confirming the commit)
We would prefer the first option, but this could get tricky if sysrepo returns an error during a timeout or a kill/close-session. Do you have any thoughts on this?

Thanks,
Adam Weast

This message has been classified Public by ADAM WEAST on Tuesday, May 08, 2018 at 12:59:14 PM.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sysrepo.org/archives/sysrepo-devel/attachments/20180508/3df0cbeb/attachment.html>
Next message (by thread): [sysrepo-devel] Confirmed Commit Capability (RFC 6241 8.4)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the sysrepo-devel mailing list