[sysrepo-devel] ?==?utf-8?q? Confirmed Commit Capability (RFC 6241 8.4)

Michal Vaško mvasko at cesnet.cz
Mon Apr 23 08:54:28 UTC 2018
Previous message (by thread): [sysrepo-devel] Confirmed Commit Capability (RFC 6241 8.4)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Adam,
here is my input on this.

I agree with others that confirmed commit should be implemented in netopeer2-server, if possible, not sysrepo. Specific comment included inline.

On Saturday, April 21, 2018 00:11 CEST, ADAM WEAST <ADAM.WEAST at adtran.com> wrote: 
 
> Mikael / Radek, 
> Thank you once again for your responses.
> 
> Our main goal is to get NETCONF confirmed commit functionality within netopeer2. We do not have any specific use cases for requiring the functionality in sysrepo specifically. However, we can see a use case for a remotely-accessed CLI client to sysrepo that would also desire the same confirmed commit functionality. For example, the JUNOS CLI provides confirmed-commit functionality. We originally investigated placing all of the logic in netopeer2 but we had some initial thoughts and concerns that caused us to shift to a sysrepo-focused implementation instead. We did some brainstorming to come up with some ideas associated with moving the logic out of sysrepo into netopeer2. This is based on our current understanding of the system. Please let us know if we appear to be misunderstanding any part of the system or if we are over-thinking any part of the design.

I will just comment on the remote CLI, there is no such thing (and not planned) on level of sysrepo. NETCONF interface is meant to be used, maybe RESTCONF will be implemented later. So this confirmed commit would be available only to NETCONF, which we believe is fine.

> 
> To start, assume we perform all of the logic within netopeer2. Netopeer2 would be in charge of getting the current running store before a confirmed <commit>. Netopeer2 would own any timers used for confirmed <commit> timeouts. When a timeout occurs or a <cancel-commit> is received, netopeer2 would be in charge of generating the inverse operations and applying those changes to the datastore. At a minimum, sysrepo needs to provide the following operations:
>   1. allow sysrepo clients to revert to a previous datastore state
>   2. allow sysrepo clients to prevent other sysrepo clients from acquiring datastore locks.

Not sure if only wrong word use but we are always talking about NETCONF, NETCONF clients must be able to revert to previous datastore state (<cancel-commit>, for instance) and NETCONF clients must work according to NETCONF specification. I think the current sysrepo locks are enough to enforce everything the confirmed-commit capability requires but I may be overlooking something.

> 
> Operation 1
> ============
> Normally, the sysrepo commit uses three stages of notifications, Verify, Apply and Abort. If netopeer2 needs to revert to a previous datastore state, we believe that we would need to skip the Verify notification stage and go straight to Abort. In theory, there should not be any issues since the datastore would be reverting to a previous good state, but we do not believe any subscribed service should be able to block a reverting sysrepo commit used during a timeout or <cancel-commit>. Additionally the user session that triggers a sysrepo commit revert may not have permissions to revert every change that needs to be reverted in order to conform to the NETCONF RFC requirements for confirmed-commit. To allow this, we would need a way to perform a sysrepo commit operation with only Abort notifications and no NACM / ACL checks. Introducing a sr_force_commit type command could allow netopeer2 to revert the <commit> as required by the RFC, but this would also open up a possible vulnerability that a malicious client could take advantage of. In other words, this option prevents sysrepo from adequately authorizing changes to the datastore. We assume you and others will find this to be unacceptable.

Actually, I do not think any sysrepo change is needed whatsoever. <commit> with confirmed will simply call sr_commit() and <cancel-commit> will, again, simply call sr_commit() but with <candidate> having the old configuration thus restoring it completely transparently to sysrepo. Is it really necessary to force the <cancel-commit> even though some partial rollback change could not be applied? This is a specific case which is, I think, not worth to be covered using some new sysrepo API. You propose to simply ignore such failures and pretend the configuration was reverted, is that a better behaviour?

> 
> One other item that might be important to note if the confirmed <commit> operation is moved outside of sysrepo. Sysrepo clients will need to implement a means of persisting the old configuration outside of sysrepo. For instance, if netopeer2 were to crash during an ongoing confirmed <commit>, we would lose the previous datastore state unless we implement persistent storage within netopeer2. This could be done for each client, but a similar robust solution would be needed for each. If you expect a proliferation of additional clients outside of netopeer2, would you expect all of them that want a confirmed <commit> capability to implement a robust solution that sysrepo could count on for authorization and data integrity?
> 
> So, a more acceptable solution could give sysrepo new client library functions to save the current state of the session's currently selected datastore, but require netopeer2 to handle the timeout and persist-id logic. For this idea, we envision adding client library functions that follow this idea:
>   * sr_save_snapshot - saves a snapshot of the current state of the current datastore - used in NETCONF initial confirmed <commit>
>   * sr_discard_snapshot - discards the saved datastore information - used in NETCONF confirming <commit>
>   * sr_revert_to_snapshot - reverts the datastore to the snapshot contents - used in timeout or NETCONF <cancel-commit>
>   * sr_has_snapshot - indicates if there is currently a saved snapshot for the current datastore
> 
> Problem 1
> ============
> This allows sysrepo to still maintain some level of authorization and access control but it introduces additional problems. Our main concern with this option comes from the need for a cleanup procedure. Who is responsible for the snapshot and cleanup, i.e. sysrepo or the sysrepo client (netopeer2)? For example, if a netopeer2 client issued a confirmed <commit>, netopeer2 would issue a sr_save_snapshot before issuing sr_copy_config. If netopeer2-server were to crash before the snapshot could be discarded or reverted, how and when does sysrepo cleanup the snapshot? 
> The cleanup becomes especially important as Operation 2 is considered. If sysrepo allows a confirmed commit operation (whether implemented in netopeer2 / CLI or sysrepo) to prevent locks against the running datastore, then proper cleanup is critical to operations.
> 
> Possible solution #1 to problem 1 - Sysrepo is responsible for clean up
> ============
> We could let sysrepo have the responsibility for the snapshot and cleanup. There are two options that we thought of for this that could be used together or separately.
> 
> * Sysrepo could clean up the snapshot after a specified time. This would be analogous to the confirmed-commit timeout
> * Sysrepo could associate the snapshot with a session and if the session closes, the datastore would automatically be reverted to the snapshot. 
> If we associate a snapshot with a session to perform cleanup, it is important to note that the NETCONF confirmed commit operation allows more than one NETCONF session to be involved (in the case of persist-id). The session that originated the confirmed commit could even disconnect and another session could send the confirming <commit>, amend, or cancel the confirmed <commit>. So this scenario would require more than one sysrepo session. There would be one or more user sessions that originate, amend, or cancel the confirmed <commit>. And there would be an administrative session responsible for associating with the snapshot and controlling its lifetime. This administrative connection should most likely be hidden from other NETCONF or sysrepo users. In other words, it seems desirable that these sessions do not show up in audits, current session lists, etc.
> 
> Possible solution #2 to problem 1 - Sysrepo clients are responsible for snapshot cleanup
> ============
> Sysrepo does not associate anything with a snapshot. If a client disconnects or crashes after creating a snapshot, that snapshot will persist until any client reconnects and discards/reverts. This would require more cleanup logic within the sysrepo clients (netopeer2). This brings up another issue if the sysrepo clients can misbehave either maliciously or even if they crash. For example, if netopeer2 starts and sees an active snapshot, should he assume that he created the snapshot earlier and needs to clean it up now? If a different client (not netopeer2) were to connect and see an active snapshot, how does he know if the client who originally created the snapshot is still active?
> 
> This could be resolved by sharing a token between sysrepo and the sysrepo client (netopeer2) that is associated with the snapshot. This helps to identify the owner of the current snapshot and helps synchronize cleanups between sysrepo clients when reboots and crashes happen. This would be similar to the NETCONF confirmed commit persist-id.

Regarding this whole proposal, I think we can, again, refrain from doing any changes to sysrepo API. You are right that this would require a global persistent storage of ongoing commits in netopeer2. Is there any problem with using normal XML files accessible only to root? In the end, if that is not enough then sysrepo is not secure enough because it relies on the same mechanism with its datastores.

> 
> Operation 2
> ============
> The following requirement is listed in NETCONF RFC 6241 Section 7.5.
> 
>     A lock MUST NOT be granted if any of the following conditions is true:
> 
>           *  The target configuration is <running>, and another NETCONF
>              session has an ongoing confirmed commit (Section 8.4).
> 
> This is not the exact same as acquiring the lock before performing a confirmed <commit> operation. A NETCONF <lock> and sr_lock_datastore() operation both unlock when the session ends. A NETCONF confirmed <commit> operation with persist-id may have a duration that extends beyond a single session lifetime. The RFC also does not require that the session executing a confirmed <commit> operation actually hold the lock.
> 
> In order to push as much logic into the client rather than sysrepo, then NETCONF requires sysrepo to provide a mechanism to prevent lock operations. As noted above this must extend beyond a single sysrepo session. An example proposal is described below.
> 
> * sr_prevent_lock_operations - This function prevents other sessions from acquiring locks for the target datastore. It cannot acquire the locks but it could fail if the locks are already held.
> * sr_allow_lock_operations - This function undoes the sr_prevent_lock_operations and resumes normal locking behavior.
> 
> The problems associated with this operation are centered around clean up similar to the snapshots described above. If sysrepo has the responsibility for clean up then we will need a timer and / or a hidden, administrative session. If the clients are performing this clean up, then a mechanism to determine which client is responsible seems necessary.

Here is our idea. Everything will work fine on NETCONF level. A client starts a confirmed <commit>, netopeer2 remembers this so if another session want <running> <lock>, it will fail. However, there is a problem on sysrepo level. For example, if there is ongoing confirmed commit, another client calls sr_lock() on <running>, and the <commit> should be cancelled, it would fail because of the lock. The simplest solution seems to implicitly lock <running> at the beginning of every confirmed commit until it is finished, it is compliant to the specification, I think.

> 
> ============
> Overall, by our estimation moving this outside sysrepo still requires quite a bit of changes to sysrepo. More importantly the problems and solutions described above feel less robust and possibly even more complex in general than if we let sysrepo manage the whole process. 
> In our view a shared data repository wants to maintain the internal consistency, integrity, and authenticity of data for a set of clients. As the data repository gives more control to external clients, the harder and more complicated it becomes for it to perform its job well. We are relatively new to the sysrepo project and we are uncertain about what you feel is the scope and responsibility of the project so we could be wrong in assuming that sysrepo wants all of this control and these responsibilities.
> 
> Also, while we do not have an immediate use case for multiple different sysrepo clients needing confirmed commit, we do see the possibility that other clients would want it for the same reasons that NETCONF found it beneficial. This does not require that a sysrepo confirmed commit operation be the exact same as the NETCONF one, only that they be compatible.
> 
> In conclusion, we feel that sysrepo is the best place both technically and conceptually for the implementation. We concede the possibility that we might be exaggerating some of the outlined problems or that we may have missed other, better solutions, so please let us know if that is the case. 
> If we continue down the path of implementing this within sysrepo, then in response to one of Radek's comments on our original proposed design, we do agree with the suggestion to preserve sr_commit() and introducing an sr_confirmed_commit(). Following that idea, we would also introduce a sr_confirmed_copy_config() and leave sr_copy_config() unmodified.

I would very much like to keep sysrepo as complex as it is now and try not to add more to it if not necessary, there is already a lot in it. So if you do not have some critical problems with the proposed ideas, we should follow them.

Regards,
Michal

> -----Original Message-----
> From: Mikael Abrahamsson [mailto:swmike at swm.pp.se] Sent: Wednesday, April 18, 2018 3:22 AM
> To: ADAM WEAST <ADAM.WEAST at adtran.com>
> Cc: sysrepo-devel at sysrepo.org
> Subject: Re: [sysrepo-devel] Confirmed Commit Capability (RFC 6241 8.4)
> 
> On Tue, 17 Apr 2018, Radek Krejčí wrote:
> 
> > Personally, when I was thinking about confirmed commit, it was much > more NETCONF feature to me.
> 
> This has been my thought as well, commit-confirmed is used when one fears that the new configuration will cause network connectivity problems, and thus want to make sure the device is reachable after the new configuration has been applied. Otherwise configuration should be rolled back.
> 
> This means commit-confirmed can be implemented in the Netopeer2-server part instead of doing it in Sysrepo datastore process itself. We'd like to keep Sysrepod quite simple and self-contained, where possible.
> 
> If you have a different use-case, it would be interesting to hear about it.
> 
> -- Mikael Abrahamsson    email: swmike at swm.pp.se
> This message has been classified Public by ADAM WEAST on Friday, April 20, 2018 at 5:11:52 PM.
> 
> _______________________________________________
> sysrepo-devel mailing list
> sysrepo-devel at sysrepo.org
> http://lists.sysrepo.org/listinfo/sysrepo-devel
Previous message (by thread): [sysrepo-devel] Confirmed Commit Capability (RFC 6241 8.4)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the sysrepo-devel mailing list