System Requirements
-
Requires Java 11 (OpenJDK or Amazon JDK)
-
Supported Operating Systems:
-
Redhat Enterprise Linux
-
CentOs
-
Rocky Linux
-
-
Supported Web Browsers:
-
Microsoft Edge: Current & (Current - 1)
-
Mozilla FireFox: Current & (Current - 1)
-
Google Chrome: Current & (Current - 1)
-
Safari: Current & (Current - 1)
-
Under sustained and extremely high throughput the CodeCache settings may need to be tuned to avoid sudden performance loss. See the Bootstrap Properties section for more information. |
Starting Clockspring
-
Linux
-
From the
<installdir>/bin
directory, execute the following commands by typing./clockspring.sh <command>
:-
start
: starts a Clockspring in the background -
stop
: stops a Clockspring service that is running in the background -
status
: provides the current status of the Clockspring instance -
install
: installs Clockspring as a service that can then be controlled via-
service clockspring start
-
service clockspring stop
-
service clockspring status
-
-
-
When Clockspring starts for the first time the following files and directories are created:
-
content_repository
-
database_repository
-
flowfile_repository
-
provenance_repository
-
work
directory -
logs
directory -
Within the
conf
directory, the flow.xml.gz file is created
See the System Properties section of this guide for more information about configuring repositories and configuration files.
Port Configuration
The following table lists the default ports used by Clockspring and the corresponding property in the clockspring.properties file.
Function | Property | Default Value |
---|---|---|
HTTPS Port |
|
|
Remote Input Socket Port* |
|
|
Cluster Node Protocol Port* |
|
|
Cluster Node Load Balancing Port |
|
|
Web HTTP Forwarding Port |
|
none |
Embedded ZooKeeper
The following table lists the default ports used by an [embedded_zookeeper] and the corresponding property in the zookeeper.properties file.
Function | Property | Default Value |
---|---|---|
ZooKeeper Server Quorum and Leader Election Ports |
|
none |
Commented examples for the ZooKeeper server ports are included in the zookeeper.properties file in the form server.N=clockspring-nodeN-hostname:2888:3888;2181 .
|
Configuration Best Practices
Typical Linux defaults are not necessarily well-tuned for the needs of an IO intensive application like Clockspring.
- Maximum File Handles
-
Clockspring will at any one time potentially have a very large number of file handles open. Increase the limits by editing /etc/security/limits.conf to add something like
* hard nofile 50000 * soft nofile 50000
- Maximum Forked Processes
-
Clockspring may be configured to generate a significant number of threads. To increase the allowable number, edit /etc/security/limits.conf
* hard nproc 10000 * soft nproc 10000
And your distribution may require an edit to /etc/security/limits.d/90-nproc.conf by adding
* soft nproc 10000
- Increase the number of TCP socket ports available
-
This is particularly important if your flow will be setting up and tearing down a large number of sockets in a small period of time.
sudo sysctl -w net.ipv4.ip_local_port_range="10000 65000"
- Set how long sockets stay in a TIMED_WAIT state when closed
-
You don’t want your sockets to sit and linger too long given that you want to be able to quickly setup and teardown new sockets. It is a good idea to read more about it and adjust to something like
for kernel 2.6
sudo sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait="1"
for kernel 3.0
sudo sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait="1"
- Disable swap
-
Leaving swap enabled may cause performance issues as RAM is written to and retreived from the disk. To configure Linux with no swap edit /etc/sysctl.conf to add the following line
vm.swappiness = 0
Recommended Antivirus Exclusions
Antivirus software can take a long time to scan large directories and the numerous files within them. Additionally, if the antivirus software locks files or directories during a scan, those resources are unavailable to Clockspring processes, causing latency or unavailability of these resources. To prevent these performance and reliability issues from occurring, it is highly recommended to configure your antivirus software to skip scans on the following directories:
-
content_repository
-
flowfile_repository
-
logs
-
provenance_repository
-
state
Logging Configuration
NiFi uses logback as the runtime logging implementation. The conf
directory contains a
standard logback.xml
configuration with default appender and level settings. The
logback manual provides a complete reference of available options.
Standard Log Files
The standard logback configuration includes the following appender definitions and associated log files:
File | Description |
---|---|
|
Application log containing framework and component messages |
|
Bootstrap log containing startup and shutdown messages |
|
Deprecation log containing warnings for deprecated components and features |
|
HTTP request log containing user interface and REST API access messages |
|
User log containing authentication and authorization messages |
Deprecation Logging
The nifi-deprecation.log
contains warning messages describing components and features that will be removed in
subsequent versions. Deprecation warnings should be evaluated and addressed to avoid breaking changes when upgrading to
a new major version. Resolving deprecation warnings involves upgrading to new components, changing component property
settings, or refactoring custom component classes.
Deprecation logging provides a method for checking compatibility before upgrading from one major release version to another. Upgrading to the latest minor release version will provide the most accurate set of deprecation warnings.
It is important to note that deprecation logging applies to both components and features. Logging for deprecated features requires a runtime reference to the property or method impacted. Disabled components with deprecated properties or methods will not generate deprecation logs. For this reason, it is important to exercise all configured components long enough to exercise standard flow behavior.
Deprecation logging can generate repeated messages depending on component configuration and usage patterns. Disabling
deprecation logging for a specific component class can be configured by adding a logger
element to logback.xml
.
The name
attribute must start with deprecation
, followed by the component class. Setting the level
attribute to
OFF
disables deprecation logging for the component specified.
<logger name="deprecation.org.apache.nifi.processors.ListenLegacyProtocol" level="OFF" />
Security Configuration
By default Clockspring will generate a self-signed SSL certificate and listen on port 8443. This can be updated to use a CA-generated certificate by updating the following values in the clockspring.properties file:
Property Name | Description |
---|---|
|
Filename of the Keystore that contains the server’s private key. |
|
The type of Keystore. Must be |
|
The password for the Keystore. |
|
The password for the certificate in the Keystore. If not set, the value of |
|
Filename of the Truststore that will be used to authorize those connecting to Clockspring. A secured instance with no Truststore will refuse all incoming connections. |
|
The type of the Truststore. Must be |
|
The password for the Truststore. |
Automatic refreshing of Clockspring’s web SSL context factory can be enabled using the following properties:
Property Name | Description |
---|---|
|
Specifies whether the SSL context factory should be automatically reloaded if updates to the keystore and truststore are detected. By default, it is set to |
|
Specifies the interval at which the keystore and truststore are checked for updates. Only applies if |
Once the nifi.security.autoreload.enabled
property is set to true
, any valid changes to the configured keystore and truststore will cause the SSL context
factory to be reloaded, allowing clients to pick up the changes. This is intended to allow expired certificates to be updated in the keystore and new trusted
certificates to be added in the truststore, all without having to restart the service.
Changes to any of the nifi.security.keystore* or nifi.security.truststore* properties will not be picked up by the auto-refreshing logic, which assumes the passwords and store paths will remain the same.
|
TLS Cipher Suites
The Java Runtime Environment provides the ability to specify custom TLS cipher suites to be used by servers when accepting client connections. See here for more information. To enable this feature the following properties may be set:
Property Name | Description |
---|---|
|
Set of ciphers that are available to be used by incoming client connections. Replaces system defaults if set. |
|
Set of ciphers that must not be used by incoming client connections. Filters available ciphers if set. |
Each property should take the form of a comma-separated list of common cipher names as specified
here. Regular expressions
(for example ^.*GCM_SHA256$
) may also be specified.
The semantics match the use of the following Jetty APIs:
User Authentication
Clockspring supports user authentication via client certificates, via username/password, or via OpenId Connect.
Username/password authentication is performed by a 'Login Identity Provider'. The Login Identity Provider is a pluggable mechanism for authenticating users via their username/password. Which Login Identity Provider to use is configured in the clockspring.properties file. Currently Clockspring offers username/password with Login Identity Providers options for Single User, Lightweight Directory Access Protocol (LDAP) and Kerberos.
The nifi.login.identity.provider.configuration.file
property specifies the configuration file for Login Identity Providers. By default, this property is set to ./conf/login-identity-providers.xml
.
The nifi.security.user.login.identity.provider
property indicates which of the configured Login Identity Provider should be
used. The default value of this property is single-user-provider
supporting authentication with a generated username and password.
During OpenId Connect authentication, Clockspring will redirect users to login with the Provider and then return them to the Clockspring canvas upon successful authentication.
Clockspring does not support running each multiple authentication providers concurrently. |
Single User
The default Single User Login Identity Provider supports automated generation of username and password credentials.
The default username is 'admin'. The generated password will be a random string consisting of 32 characters and stored using bcrypt hashing.
The default configuration in clockspring.properties enables Single User authentication:
nifi.security.user.login.identity.provider=single-user-provider
The default login-identity-providers.xml includes a blank provider definition:
<provider> <identifier>single-user-provider</identifier> <class>org.apache.nifi.authentication.single.user.SingleUserLoginIdentityProvider</class> <property name="Username"/> <property name="Password"/> </provider>
The following command can be used to change the Username and Password:
$ ./bin/clockspring.sh set-single-user-credentials <username> <password>
Lightweight Directory Access Protocol (LDAP)
Below is an example and description of configuring a Login Identity Provider that integrates with a Directory Server to authenticate users.
Set the following in clockspring.properties to enable LDAP username/password authentication:
nifi.security.user.login.identity.provider=ldap-provider
Modify login-identity-providers.xml to enable the ldap-provider
. Here is the sample provided in the file:
<provider> <identifier>ldap-provider</identifier> <class>org.apache.nifi.ldap.LdapProvider</class> <property name="Authentication Strategy">START_TLS</property> <property name="Manager DN"></property> <property name="Manager Password"></property> <property name="TLS - Keystore"></property> <property name="TLS - Keystore Password"></property> <property name="TLS - Keystore Type"></property> <property name="TLS - Truststore"></property> <property name="TLS - Truststore Password"></property> <property name="TLS - Truststore Type"></property> <property name="TLS - Client Auth"></property> <property name="TLS - Protocol"></property> <property name="TLS - Shutdown Gracefully"></property> <property name="Referral Strategy">FOLLOW</property> <property name="Connect Timeout">10 secs</property> <property name="Read Timeout">10 secs</property> <property name="Url"></property> <property name="User Search Base"></property> <property name="User Search Filter"></property> <property name="Identity Strategy">USE_DN</property> <property name="Authentication Expiration">12 hours</property> </provider>
The ldap-provider
has the following properties:
Property Name | Description |
---|---|
|
How the connection to the LDAP server is authenticated. Possible values are |
|
The DN of the manager that is used to bind to the LDAP server to search for users. |
|
The password of the manager that is used to bind to the LDAP server to search for users. |
|
Path to the Keystore that is used when connecting to LDAP using LDAPS or START_TLS. |
|
Password for the Keystore that is used when connecting to LDAP using LDAPS or START_TLS. |
|
Type of the Keystore that is used when connecting to LDAP using LDAPS or START_TLS (i.e. |
|
Path to the Truststore that is used when connecting to LDAP using LDAPS or START_TLS. |
|
Password for the Truststore that is used when connecting to LDAP using LDAPS or START_TLS. |
|
Type of the Truststore that is used when connecting to LDAP using LDAPS or START_TLS (i.e. |
|
Client authentication policy when connecting to LDAP using LDAPS or START_TLS. Possible values are |
|
Protocol to use when connecting to LDAP using LDAPS or START_TLS. (i.e. |
|
Specifies whether the TLS should be shut down gracefully before the target context is closed. Defaults to false. |
|
Strategy for handling referrals. Possible values are |
|
Duration of connect timeout. (i.e. |
|
Duration of read timeout. (i.e. |
|
Space-separated list of URLs of the LDAP servers (i.e. |
|
Base DN for searching for users (i.e. |
|
Filter for searching for users against the |
|
Strategy to identify users. Possible values are |
|
The duration of how long the user authentication is valid for. If the user never logs out, they will be required to log back in following this duration. |
For changes to clockspring.properties and login-identity-providers.xml to take effect Clockspring must be restarted. If the environment is clustered, configuration files must be the same on all nodes. |
Kerberos
Below is an example and description of configuring a Login Identity Provider that integrates with a Kerberos Key Distribution Center (KDC) to authenticate users.
Set the following in clockspring.properties to enable Kerberos username/password authentication:
nifi.security.user.login.identity.provider=kerberos-provider
Modify login-identity-providers.xml to enable the kerberos-provider
. Here is the sample provided in the file:
<provider> <identifier>kerberos-provider</identifier> <class>org.apache.nifi.kerberos.KerberosProvider</class> <property name="Default Realm">NIFI.APACHE.ORG</property> <property name="Authentication Expiration">12 hours</property> </provider>
The kerberos-provider
has the following properties:
Property Name | Description |
---|---|
|
Default realm to provide when user enters incomplete user principal (i.e. |
|
The duration of how long the user authentication is valid for. If the user never logs out, they will be required to log back in following this duration. |
See also [kerberos_service] to allow single sign-on access via client Kerberos tickets.
For changes to clockspring.properties and login-identity-providers.xml to take effect, Clockspring needs to be restarted. If the environment is clustered, configuration files must be the same on all nodes. |
OpenId Connect
To enable authentication via OpenId Connect the following properties must be configured in clockspring.properties.
Property Name | Description |
---|---|
|
The discovery URL for the desired OpenId Connect Provider (http://openid.net/specs/openid-connect-discovery-1_0.html). |
|
Connect timeout when communicating with the OpenId Connect Provider. |
|
Read timeout when communicating with the OpenId Connect Provider. |
|
The client id for Clockspring after registration with the OpenId Connect Provider. |
|
The client secret for Clockspring after registration with the OpenId Connect Provider. |
|
The preferred algorithm for validating identity tokens. If this value is blank, it will default to |
|
Comma separated scopes that are sent to OpenId Connect Provider in addition to |
|
Claim that identifies the user to be logged in; default is |
|
Comma separated possible fallback claims used to identify the user in case |
SAML
To enable authentication via SAML the following properties must be configured in clockspring.properties.
Configuring a Metadata URL and an Entity Identifier enables Apache NiFi to act as a SAML 2.0 Relying Party, allowing users to authenticate using an account managed through a SAML 2.0 Asserting Party.
Property Name | Description |
---|---|
|
The URL for obtaining the identity provider’s metadata. The metadata can be retrieved from the identity provider via |
|
The entity id of the service provider. This value will be used as the |
|
The name of a SAML assertion attribute containing the user’sidentity. This property is optional and if not specified, or if the attribute is not found, then the NameID of the Subject will be used. |
|
The name of a SAML assertion attribute containing group names the user belongs to. This property is optional, but if populated the groups will be passed along to the authorization process. |
|
Controls the value of |
|
Controls the value of |
|
The algorithm to use when signing SAML messages. Reference the Open SAML Signature Constants for a list of valid values. If not specified, a default of SHA-256 will be used. The default value is |
|
The expiration of the NiFi JWT that will be produced from a successful SAML authentication response. The default value is |
|
Enables SAML SingleLogout which causes a logout from NiFi to logout of the identity provider. By default, a logout of NiFi will only remove the NiFi JWT. The default value is |
|
The truststore strategy when the IDP metadata URL begins with https. A value of |
|
The connection timeout when communicating with the SAML IDP. The default value is |
|
The read timeout when communicating with the SAML IDP. The default value is |
SAML REST Resources
SAML authentication enables the following REST API resources for integration with a SAML 2.0 Asserting Party:
Resource Path | Description |
---|---|
/nifi-api/access/saml/local-logout/request |
Complete SAML 2.0 Logout processing without communicating with the Asserting Party |
/nifi-api/access/saml/login/consumer |
Process SAML 2.0 Login Requests assertions using HTTP-POST or HTTP-REDIRECT binding |
/nifi-api/access/saml/metadata |
Retrieve SAML 2.0 entity descriptor metadata as XML |
/nifi-api/access/saml/single-logout/consumer |
Process SAML 2.0 Single Logout Request assertions using HTTP-POST or HTTP-REDIRECT binding. Requires Single Logout to be enabled. |
/nifi-api/access/saml/single-logout/request |
Complete SAML 2.0 Single Logout processing initiating a request to the Asserting Party. Requires Single Logout to be enabled. |
JSON Web Tokens
Clockspring uses JSON Web Tokens to provide authenticated access after the initial login process. Generated JSON Web Tokens include the authenticated user identity as well as the issuer and expiration from the configured Login Identity Provider.
Clockspring uses generated RSA Key Pairs with a key size of 4096 bits to support the PS512
algorithm for JSON Web Signatures. The system stores RSA
Public Keys using the configured local State Provider and retains the RSA Private Key in memory. This approach supports signature verification
for the expiration configured in the Login Identity Provider without persisting the private key.
JSON Web Token support includes revocation on logout using JSON Web Token Identifiers. The system denies access for expired tokens based on the Login Identity Provider configuration, but revocation invalidates the token prior to expiration. The system stores revoked identifiers using the configured local State Provider and runs a scheduled command to delete revoked identifiers after the associated expiration.
The following settings can be configured in clockspring.properties to control JSON Web Token signing.
Property Name | Description |
---|---|
|
JSON Web Signature Key Rotation Period defines how often the system generates a new RSA Key Pair, expressed as an ISO 8601 duration. The default is one hour: |
Authorization
Authorizer Configuration
An 'authorizer' grants users the privileges to manage users and policies by creating preliminary authorizations at startup.
Authorizers are configured using two properties in the clockspring.properties file:
-
The
nifi.authorizer.configuration.file
property specifies the configuration file where authorizers are defined. By default, the authorizers.xml file located in the root installation conf directory is selected. -
The
nifi.security.user.authorizer
property indicates which of the configured authorizers in the authorizers.xml file to use.
Authorizers.xml Setup
The authorizers.xml file is used to define and configure available authorizers. The default authorizer is the StandardManagedAuthorizer. The managed authorizer is comprised of a UserGroupProvider and a AccessPolicyProvider. The users, group, and access policies will be loaded and optionally configured through these providers. The managed authorizer will make all access decisions based on these provided users, groups, and access policies.
During startup there is a check to ensure that there are no two users/groups with the same identity/name. This check is executed regardless of the configured implementation. This is necessary because this is how users/groups are identified and authorized during access decisions.
FileUserGroupProvider
The default UserGroupProvider is the FileUserGroupProvider, however, you can develop additional UserGroupProviders as extensions. The FileUserGroupProvider has the following properties:
-
Users File - The file where the FileUserGroupProvider stores users and groups. By default, the users.xml in the
conf
directory is chosen. -
Legacy Authorized Users File - The full path to an existing authorized-users.xml that will be automatically be used to load the users and groups into the Users File.
-
Initial User Identity - The identity of a users and systems to seed the Users File. The name of each property must be unique, for example: "Initial User Identity A", "Initial User Identity B", "Initial User Identity C" or "Initial User Identity 1", "Initial User Identity 2", "Initial User Identity 3"
LdapUserGroupProvider
Another option for the UserGroupProvider is the LdapUserGroupProvider. By default, this option is commented out but can be configured in lieu of the FileUserGroupProvider. This will sync users and groups from a directory server and will present them in the UI in read only form.
The LdapUserGroupProvider has the following properties:
Property Name | Description |
---|---|
|
How the connection to the LDAP server is authenticated. Possible values are |
|
The DN of the manager that is used to bind to the LDAP server to search for users. |
|
The password of the manager that is used to bind to the LDAP server to search for users. |
|
Path to the Keystore that is used when connecting to LDAP using LDAPS or START_TLS. |
|
Password for the Keystore that is used when connecting to LDAP using LDAPS or START_TLS. |
|
Type of the Keystore that is used when connecting to LDAP using LDAPS or START_TLS (i.e. |
|
Path to the Truststore that is used when connecting to LDAP using LDAPS or START_TLS. |
|
Password for the Truststore that is used when connecting to LDAP using LDAPS or START_TLS. |
|
Type of the Truststore that is used when connecting to LDAP using LDAPS or START_TLS (i.e. |
|
Client authentication policy when connecting to LDAP using LDAPS or START_TLS. Possible values are |
|
Protocol to use when connecting to LDAP using LDAPS or START_TLS. (i.e. |
|
Specifies whether the TLS should be shut down gracefully before the target context is closed. Defaults to false. |
|
Strategy for handling referrals. Possible values are |
|
Duration of connect timeout. (i.e. |
|
Duration of read timeout. (i.e. |
|
Space-separated list of URLs of the LDAP servers (i.e. |
|
Sets the page size when retrieving users and groups. If not specified, no paging is performed. |
|
Sets whether group membership decisions are case sensitive. When a user or group is inferred (by not specifying or user or group search base or user identity attribute or group name attribute) case sensitivity is enforced since the value to use for the user identity or group name would be ambiguous. Defaults to false. |
|
Duration of time between syncing users and groups. (i.e. |
|
Base DN for searching for users (i.e. |
|
Object class for identifying users (i.e. |
|
Search scope for searching users ( |
|
Filter for searching for users against the |
|
Attribute to use to extract user identity (i.e. |
|
Attribute to use to define group membership (i.e. |
|
If blank, the value of the attribute defined in |
|
Base DN for searching for groups (i.e. |
|
Object class for identifying groups (i.e. |
|
Search scope for searching groups ( |
|
Filter for searching for groups against the |
|
Attribute to use to extract group name (i.e. |
|
Attribute to use to define group membership (i.e. |
|
If blank, the value of the attribute defined in |
Any identity mapping rules specified in clockspring.properties will also be applied to the user identities. Group names are not mapped. |
ShellUserGroupProvider
The ShellUserGroupProvider fetches user and group details from Unix-like systems using shell commands.
This provider executes various shell pipelines with commands such as getent
on Linux and dscl
on macOS.
Supported systems may be configured to retrieve users and groups from an external source, such as LDAP or NIS. In these cases the shell commands will return those external users and groups. This provides administrators another mechanism to integrate user and group directory services.
The ShellUserGroupProvider has the following properties:
Property Name | Description |
---|---|
|
Duration of initial delay before first user and group refresh. (i.e. |
|
Duration of delay between each user and group refresh. (i.e. |
|
Regular expression used to exclude groups. Default is '', which means no groups are excluded. |
|
Regular expression used to exclude users. Default is '', which means no users are excluded. |
Like LdapUserGroupProvider, the ShellUserGroupProvider is commented out in the authorizers.xml file. Refer to that comment for usage examples.
AzureGraphUserGroupProvider
The AzureGraphUserGroupProvider fetches users and groups from Azure Active Directory (AAD) using the Microsoft Graph API.
A subset of groups are fetched based on filter conditions (Group Filter Prefix
, Group Filter Suffix
, Group Filter Substring
, and Group Filter List Inclusion
) evaluated against the displayName property of the Azure AD group. Member users are then loaded from these groups. At least one filter condition should be specified.
This provider requires an Azure app registration with:
-
Microsoft Graph Group.Read.All and User.Read.All API permissions with admin consent
-
A client secret or application password
-
ID token claims for upn and/or email
The AzureGraphUserGroupProvider has the following properties:
Property Name | Description |
---|---|
|
Duration of delay between each user and group refresh. Default is |
|
The endpoint of the Azure AD login. This can be found in the Azure portal under Azure Active Directory → App registrations → [application name] → Endpoints. For example, the global authority endpoint is https://login.microsoftonline.com. |
|
Tenant ID or Directory ID of the Azure AD tenant. This can be found in the Azure portal under Azure Active Directory → App registrations → [application name] → Directory (tenant) ID. |
|
Client ID or Application ID of the Azure app registration. This can be found in the Azure portal under Azure Active Directory → App registrations → [application name] → Overview → Application (client) ID. |
|
A client secret from the Azure app registration. Secrets can be created in the Azure portal under Azure Active Directory → App registrations → [application name] → Certificates & secrets → Client secrets → [+] New client secret. |
|
Prefix filter for Azure AD groups. Matches against the group displayName to retrieve only groups with names starting with the provided prefix. |
|
Suffix filter for Azure AD groups. Matches against the group displayName to retrieve only groups with names ending with the provided suffix. |
|
Substring filter for Azure AD groups. Matches against the group displayName to retrieve only groups with names containing the provided substring. |
|
Comma-separated list of Azure AD groups. If no string-based matching filter (i.e., prefix, suffix, and substring) is specified, set this property to avoid fetching all groups and users in the Azure AD tenant. |
|
Page size to use with the Microsoft Graph API. Set to 0 to disable paging API calls. Default: 50, Max: 999. |
|
The property of the user directory object mapped to the user name field. Default is 'upn'. 'email' is another option when |
Like LdapUserGroupProvider and ShellUserGroupProvider, the AzureGraphUserGroupProvider configuration is commented out in the authorizers.xml file. Refer to the comment for a starter configuration.
Composite Implementations
Another option for the UserGroupProvider are composite implementations. This means that multiple sources/implementations can be configured and composed. For instance, an admin can configure users/groups to be loaded from a file and a directory server. There are two composite implementations, one that supports multiple UserGroupProviders and one that supports multiple UserGroupProviders and a single configurable UserGroupProvider.
The CompositeUserGroupProvider will provide support for retrieving users and groups from multiple sources. The CompositeUserGroupProvider has the following property:
Property Name | Description |
---|---|
|
The identifier of user group providers to load from. The name of each property must be unique, for example: "User Group Provider A", "User Group Provider B", "User Group Provider C" or "User Group Provider 1", "User Group Provider 2", "User Group Provider 3" |
Any identity mapping rules specified in clockspring.properties are not applied in this implementation. This behavior would need to be applied by the base implementation. |
The CompositeConfigurableUserGroupProvider will provide support for retrieving users and groups from multiple sources. Additionally, a single configurable user group provider is required. Users from the configurable user group provider are configurable, however users loaded from one of the User Group Provider [unique key] will not be. The CompositeConfigurableUserGroupProvider has the following properties:
Property Name | Description |
---|---|
|
A configurable user group provider. |
|
The identifier of user group providers to load from. The name of each property must be unique, for example: "User Group Provider A", "User Group Provider B", "User Group Provider C" or "User Group Provider 1", "User Group Provider 2", "User Group Provider 3" |
FileAccessPolicyProvider
The default AccessPolicyProvider is the FileAccessPolicyProvider, however, you can develop additional AccessPolicyProvider as extensions. The FileAccessPolicyProvider has the following properties:
Property Name | Description |
---|---|
|
The identifier for an User Group Provider defined above that will be used to access users and groups for use in the managed access policies. |
|
The file where the FileAccessPolicyProvider will store policies. |
|
The identity of an initial admin user that will be granted access to the UI and given the ability to create additional users, groups, and policies. The value of this property could be a DN when using certificates or LDAP, or a Kerberos principal. This property will only be used when there are no other policies defined. If this property is specified then a Legacy Authorized Users File can not be specified. |
|
The full path to an existing authorized-users.xml that will be automatically converted to the new authorizations model. If this property is specified then an Initial Admin Identity can not be specified, and this property will only be used when there are no other users, groups, and policies defined. |
|
The identity of a cluster node. When clustered, a property for each node should be defined, so that every node knows about every other node. If not clustered these properties can be ignored. The name of each property must be unique, for example for a three node cluster: "Node Identity A", "Node Identity B", "Node Identity C" or "Node Identity 1", "Node Identity 2", "Node Identity 3" |
|
The name of a group containing cluster nodes. The typical use for this is when nodes are dynamically added/removed from the cluster. |
The identities configured in the Initial Admin Identity, the Node Identity properties, or discovered in a Legacy Authorized Users File must be available in the configured User Group Provider. |
Any users in the legacy users file must be found in the configured User Group Provider. |
Any identity mapping rules specified in clockspring.properties will also be applied to the node identities, so the values should be the unmapped identities (i.e. full DN from a certificate). This identity must be found in the configured User Group Provider. |
StandardManagedAuthorizer
The default authorizer is the StandardManagedAuthorizer, however, you can develop additional authorizers as extensions. The StandardManagedAuthorizer has the following property:
Property Name | Description |
---|---|
|
The identifier for an Access Policy Provider defined above. |
FileAuthorizer
The FileAuthorizer has been replaced with the more granular StandardManagedAuthorizer approach described above. However, it is still available for backwards compatibility reasons. The FileAuthorizer has the following properties:
Property Name | Description |
---|---|
|
The file where the FileAuthorizer stores policies. By default, the authorizations.xml in the |
|
The file where the FileAuthorizer stores users and groups. By default, the users.xml in the |
|
The identity of an initial admin user that is granted access to the UI and given the ability to create additional users, groups, and policies. This property is only used when there are no other users, groups, and policies defined. |
|
The full path to an existing authorized-users.xml that is automatically converted to the multi-tenant authorization model. This property is only used when there are no other users, groups, and policies defined. |
|
The identity of a cluster node. When clustered, a property for each node should be defined, so that every node knows about every other node. If not clustered, these properties can be ignored. |
Any identity mapping rules specified in clockspring.properties will also be applied to the initial admin identity, so the value should be the unmapped identity. |
Any identity mapping rules specified in clockspring.properties will also be applied to the node identities, so the values should be the unmapped identities (i.e. full DN from a certificate). |
Initial Admin Identity (New Instance)
If you are setting up a secured instance for the first time, you must manually designate an Initial Admin Identity in the authorizers.xml file. This initial admin user is granted access to the UI and given the ability to create additional users, groups, and policies. The value of this property could be a DN (when using certificates or LDAP) or a Kerberos principal. If you are the administrator, add yourself as the Initial Admin Identity.
After you have edited and saved the authorizers.xml file, restart Clockspring. The Initial Admin Identity user and administrative policies are added to the users.xml and authorizations.xml files during restart. Once Clockspring starts, the Initial Admin Identity user is able to access the UI and begin managing users, groups, and policies.
For a brand new secure flow, providing the "Initial Admin Identity" gives that user access to get into the UI and to manage users, groups and policies. If that user wants to start modifying the flow they need to grant themselves policies for the root process group. The system is unable to do this automatically because in a new flow the UUID of the root process group is not permanent until the flow.xml.gz is generated. If the instance is an upgrade from an existing flow.xml.gz the "Initial Admin Identity" user is automatically given the privileges to modify the flow. |
Some common use cases are described below.
File-based (LDAP Authentication)
Here is an example LDAP entry using the name John Smith:
<authorizers> <userGroupProvider> <identifier>file-user-group-provider</identifier> <class>org.apache.nifi.authorization.FileUserGroupProvider</class> <property name="Users File">./conf/users.xml</property> <property name="Legacy Authorized Users File"></property> <property name="Initial User Identity 1">cn=John Smith,ou=people,dc=example,dc=com</property> </userGroupProvider> <accessPolicyProvider> <identifier>file-access-policy-provider</identifier> <class>org.apache.nifi.authorization.FileAccessPolicyProvider</class> <property name="User Group Provider">file-user-group-provider</property> <property name="Authorizations File">./conf/authorizations.xml</property> <property name="Initial Admin Identity">cn=John Smith,ou=people,dc=example,dc=com</property> <property name="Legacy Authorized Users File"></property> <property name="Node Identity 1"></property> </accessPolicyProvider> <authorizer> <identifier>managed-authorizer</identifier> <class>org.apache.nifi.authorization.StandardManagedAuthorizer</class> <property name="Access Policy Provider">file-access-policy-provider</property> </authorizer> </authorizers>
File-based (Kerberos Authentication)
Here is an example Kerberos entry using the name John Smith and realm NIFI.APACHE.ORG
:
<authorizers> <userGroupProvider> <identifier>file-user-group-provider</identifier> <class>org.apache.nifi.authorization.FileUserGroupProvider</class> <property name="Users File">./conf/users.xml</property> <property name="Legacy Authorized Users File"></property> <property name="Initial User Identity 1">johnsmith@NIFI.APACHE.ORG</property> </userGroupProvider> <accessPolicyProvider> <identifier>file-access-policy-provider</identifier> <class>org.apache.nifi.authorization.FileAccessPolicyProvider</class> <property name="User Group Provider">file-user-group-provider</property> <property name="Authorizations File">./conf/authorizations.xml</property> <property name="Initial Admin Identity">johnsmith@NIFI.APACHE.ORG</property> <property name="Legacy Authorized Users File"></property> <property name="Node Identity 1"></property> </accessPolicyProvider> <authorizer> <identifier>managed-authorizer</identifier> <class>org.apache.nifi.authorization.StandardManagedAuthorizer</class> <property name="Access Policy Provider">file-access-policy-provider</property> </authorizer> </authorizers>
LDAP-based Users/Groups Referencing User DN
Here is an example loading users and groups from LDAP. Group membership will be driven through the member attribute of each group. Authorization will still use file-based access policies:
dn: cn=User 1,ou=users,o=clockspring objectClass: organizationalPerson objectClass: person objectClass: inetOrgPerson objectClass: top cn: User 1 sn: User1 uid: user1 dn: cn=User 2,ou=users,o=clockspring objectClass: organizationalPerson objectClass: person objectClass: inetOrgPerson objectClass: top cn: User 2 sn: User2 uid: user2 dn: cn=admins,ou=groups,o=clockspring objectClass: groupOfNames objectClass: top cn: admins member: cn=User 1,ou=users,o=clockspring member: cn=User 2,ou=users,o=clockspring <authorizers> <userGroupProvider> <identifier>ldap-user-group-provider</identifier> <class>org.apache.nifi.ldap.tenants.LdapUserGroupProvider</class> <property name="Authentication Strategy">ANONYMOUS</property> <property name="Manager DN"></property> <property name="Manager Password"></property> <property name="TLS - Keystore"></property> <property name="TLS - Keystore Password"></property> <property name="TLS - Keystore Type"></property> <property name="TLS - Truststore"></property> <property name="TLS - Truststore Password"></property> <property name="TLS - Truststore Type"></property> <property name="TLS - Client Auth"></property> <property name="TLS - Protocol"></property> <property name="TLS - Shutdown Gracefully"></property> <property name="Referral Strategy">FOLLOW</property> <property name="Connect Timeout">10 secs</property> <property name="Read Timeout">10 secs</property> <property name="Url">ldap://localhost:10389</property> <property name="Page Size"></property> <property name="Sync Interval">30 mins</property> <property name="Group Membership - Enforce Case Sensitivity">false</property> <property name="User Search Base">ou=users,o=clockspring</property> <property name="User Object Class">person</property> <property name="User Search Scope">ONE_LEVEL</property> <property name="User Search Filter"></property> <property name="User Identity Attribute">cn</property> <property name="User Group Name Attribute"></property> <property name="User Group Name Attribute - Referenced Group Attribute"></property> <property name="Group Search Base">ou=groups,o=clockspring</property> <property name="Group Object Class">groupOfNames</property> <property name="Group Search Scope">ONE_LEVEL</property> <property name="Group Search Filter"></property> <property name="Group Name Attribute">cn</property> <property name="Group Member Attribute">member</property> <property name="Group Member Attribute - Referenced User Attribute"></property> </userGroupProvider> <accessPolicyProvider> <identifier>file-access-policy-provider</identifier> <class>org.apache.nifi.authorization.FileAccessPolicyProvider</class> <property name="User Group Provider">ldap-user-group-provider</property> <property name="Authorizations File">./conf/authorizations.xml</property> <property name="Initial Admin Identity">John Smith</property> <property name="Legacy Authorized Users File"></property> <property name="Node Identity 1"></property> </accessPolicyProvider> <authorizer> <identifier>managed-authorizer</identifier> <class>org.apache.nifi.authorization.StandardManagedAuthorizer</class> <property name="Access Policy Provider">file-access-policy-provider</property> </authorizer> </authorizers>
The Initial Admin Identity
value would have loaded from the cn from John Smith’s entry based on the User Identity Attribute
value.
LDAP-based Users/Groups Referencing User Attribute
Here is an example loading users and groups from LDAP. Group membership will be driven through the member uid attribute of each group. Authorization will still use file-based access policies:
dn: uid=User 1,ou=Users,dc=local objectClass: inetOrgPerson objectClass: posixAccount objectClass: shadowAccount uid: user1 cn: User 1 dn: uid=User 2,ou=Users,dc=local objectClass: inetOrgPerson objectClass: posixAccount objectClass: shadowAccount uid: user2 cn: User 2 dn: cn=Managers,ou=Groups,dc=local objectClass: posixGroup cn: Managers memberUid: user1 memberUid: user2 <authorizers> <userGroupProvider> <identifier>ldap-user-group-provider</identifier> <class>org.apache.nifi.ldap.tenants.LdapUserGroupProvider</class> <property name="Authentication Strategy">ANONYMOUS</property> <property name="Manager DN"></property> <property name="Manager Password"></property> <property name="TLS - Keystore"></property> <property name="TLS - Keystore Password"></property> <property name="TLS - Keystore Type"></property> <property name="TLS - Truststore"></property> <property name="TLS - Truststore Password"></property> <property name="TLS - Truststore Type"></property> <property name="TLS - Client Auth"></property> <property name="TLS - Protocol"></property> <property name="TLS - Shutdown Gracefully"></property> <property name="Referral Strategy">FOLLOW</property> <property name="Connect Timeout">10 secs</property> <property name="Read Timeout">10 secs</property> <property name="Url">ldap://localhost:10389</property> <property name="Page Size"></property> <property name="Sync Interval">30 mins</property> <property name="Group Membership - Enforce Case Sensitivity">false</property> <property name="User Search Base">ou=Users,dc=local</property> <property name="User Object Class">posixAccount</property> <property name="User Search Scope">ONE_LEVEL</property> <property name="User Search Filter"></property> <property name="User Identity Attribute">cn</property> <property name="User Group Name Attribute"></property> <property name="User Group Name Attribute - Referenced Group Attribute"></property> <property name="Group Search Base">ou=Groups,dc=local</property> <property name="Group Object Class">posixGroup</property> <property name="Group Search Scope">ONE_LEVEL</property> <property name="Group Search Filter"></property> <property name="Group Name Attribute">cn</property> <property name="Group Member Attribute">memberUid</property> <property name="Group Member Attribute - Referenced User Attribute">uid</property> </userGroupProvider> <accessPolicyProvider> <identifier>file-access-policy-provider</identifier> <class>org.apache.nifi.authorization.FileAccessPolicyProvider</class> <property name="User Group Provider">ldap-user-group-provider</property> <property name="Authorizations File">./conf/authorizations.xml</property> <property name="Initial Admin Identity">John Smith</property> <property name="Legacy Authorized Users File"></property> <property name="Node Identity 1"></property> </accessPolicyProvider> <authorizer> <identifier>managed-authorizer</identifier> <class>org.apache.nifi.authorization.StandardManagedAuthorizer</class> <property name="Access Policy Provider">file-access-policy-provider</property> </authorizer> </authorizers>
Composite - File and LDAP-based Users/Groups
Here is an example composite implementation loading users and groups from LDAP and a local file. Group membership will be driven through the member attribute of each group. The users from LDAP will be read only while the users loaded from the file will be configurable in UI.
dn: cn=User 1,ou=users,o=clockspring objectClass: organizationalPerson objectClass: person objectClass: inetOrgPerson objectClass: top cn: User 1 sn: User1 uid: user1 dn: cn=User 2,ou=users,o=clockspring objectClass: organizationalPerson objectClass: person objectClass: inetOrgPerson objectClass: top cn: User 2 sn: User2 uid: user2 dn: cn=admins,ou=groups,o=clockspring objectClass: groupOfNames objectClass: top cn: admins member: cn=User 1,ou=users,o=clockspring member: cn=User 2,ou=users,o=clockspring <authorizers> <userGroupProvider> <identifier>file-user-group-provider</identifier> <class>org.apache.nifi.authorization.FileUserGroupProvider</class> <property name="Users File">./conf/users.xml</property> <property name="Legacy Authorized Users File"></property> <property name="Initial User Identity 1">cn=clockspring-node1,ou=servers,dc=example,dc=com</property> <property name="Initial User Identity 2">cn=clockspring-node2,ou=servers,dc=example,dc=com</property> </userGroupProvider> <userGroupProvider> <identifier>ldap-user-group-provider</identifier> <class>org.apache.nifi.ldap.tenants.LdapUserGroupProvider</class> <property name="Authentication Strategy">ANONYMOUS</property> <property name="Manager DN"></property> <property name="Manager Password"></property> <property name="TLS - Keystore"></property> <property name="TLS - Keystore Password"></property> <property name="TLS - Keystore Type"></property> <property name="TLS - Truststore"></property> <property name="TLS - Truststore Password"></property> <property name="TLS - Truststore Type"></property> <property name="TLS - Client Auth"></property> <property name="TLS - Protocol"></property> <property name="TLS - Shutdown Gracefully"></property> <property name="Referral Strategy">FOLLOW</property> <property name="Connect Timeout">10 secs</property> <property name="Read Timeout">10 secs</property> <property name="Url">ldap://localhost:10389</property> <property name="Page Size"></property> <property name="Sync Interval">30 mins</property> <property name="Group Membership - Enforce Case Sensitivity">false</property> <property name="User Search Base">ou=users,o=clockspring</property> <property name="User Object Class">person</property> <property name="User Search Scope">ONE_LEVEL</property> <property name="User Search Filter"></property> <property name="User Identity Attribute">cn</property> <property name="User Group Name Attribute"></property> <property name="User Group Name Attribute - Referenced Group Attribute"></property> <property name="Group Search Base">ou=groups,o=clockspring</property> <property name="Group Object Class">groupOfNames</property> <property name="Group Search Scope">ONE_LEVEL</property> <property name="Group Search Filter"></property> <property name="Group Name Attribute">cn</property> <property name="Group Member Attribute">member</property> <property name="Group Member Attribute - Referenced User Attribute"></property> </userGroupProvider> <userGroupProvider> <identifier>composite-user-group-provider</identifier> <class>org.apache.nifi.authorization.CompositeConfigurableUserGroupProvider</class> <property name="Configurable User Group Provider">file-user-group-provider</property> <property name="User Group Provider 1">ldap-user-group-provider</property> </userGroupProvider> <accessPolicyProvider> <identifier>file-access-policy-provider</identifier> <class>org.apache.nifi.authorization.FileAccessPolicyProvider</class> <property name="User Group Provider">composite-user-group-provider</property> <property name="Authorizations File">./conf/authorizations.xml</property> <property name="Initial Admin Identity">John Smith</property> <property name="Legacy Authorized Users File"></property> <property name="Node Identity 1">cn=clockspring-node1,ou=servers,dc=example,dc=com</property> <property name="Node Identity 2">cn=clockspring-node2,ou=servers,dc=example,dc=com</property> </accessPolicyProvider> <authorizer> <identifier>managed-authorizer</identifier> <class>org.apache.nifi.authorization.StandardManagedAuthorizer</class> <property name="Access Policy Provider">file-access-policy-provider</property> </authorizer> </authorizers>
In this example, the users and groups are loaded from LDAP but the servers are managed in a local file. The Initial Admin Identity
value came from an attribute in a LDAP entry based on the User Identity Attribute
. The Node Identity
values are established in the local file using the Initial User Identity
properties.
Do not manually edit the authorizations.xml file. Create authorizations only during initial setup. |
Cluster Node Identities
If you are running a clustered environment you must specify the identities for each node. The authorization policies required for the nodes to communicate are created during startup.
For example, if you are setting up a 2 node cluster with the following DNs for each node:
cn=clockspring-1,ou=people,dc=example,dc=com cn=clockspring-2,ou=people,dc=example,dc=com
<authorizers> <userGroupProvider> <identifier>file-user-group-provider</identifier> <class>org.apache.nifi.authorization.FileUserGroupProvider</class> <property name="Users File">./conf/users.xml</property> <property name="Legacy Authorized Users File"></property> <property name="Initial User Identity 1">johnsmith@clockspring.net</property> <property name="Initial User Identity 2">cn=clockspring-1,ou=people,dc=example,dc=com</property> <property name="Initial User Identity 3">cn=clockspring-2,ou=people,dc=example,dc=com</property> </userGroupProvider> <accessPolicyProvider> <identifier>file-access-policy-provider</identifier> <class>org.apache.nifi.authorization.FileAccessPolicyProvider</class> <property name="User Group Provider">file-user-group-provider</property> <property name="Authorizations File">./conf/authorizations.xml</property> <property name="Initial Admin Identity">johnsmith@NIFI.APACHE.ORG</property> <property name="Legacy Authorized Users File"></property> <property name="Node Identity 1">cn=clockspring-1,ou=people,dc=example,dc=com</property> <property name="Node Identity 2">cn=clockspring-2,ou=people,dc=example,dc=com</property> </accessPolicyProvider> <authorizer> <identifier>managed-authorizer</identifier> <class>org.apache.nifi.authorization.StandardManagedAuthorizer</class> <property name="Access Policy Provider">file-access-policy-provider</property> </authorizer> </authorizers>
In a cluster, all nodes must have the same authorizations.xml and users.xml. The only exception is if a node has empty authorizations.xml and user.xml files prior to joining the cluster. In this scenario, the node inherits them from the cluster during startup. |
Now that initial authorizations have been created, additional users, groups and authorizations can be created and managed in the UI.
Configuring Users & Access Policies
Depending on the capabilities of the configured UserGroupProvider and AccessPolicyProvider the users, groups, and policies will be configurable in the UI. If the extensions are not configurable the users, groups, and policies will read-only in the UI. If the configured authorizer does not use UserGroupProvider and AccessPolicyProvider the users and policies may or may not be visible and configurable in the UI based on the underlying implementation.
This section assumes the users, groups, and policies are configurable in the UI and describes:
-
How to create users and groups
-
How access policies are used to define authorizations
-
How to view policies that are set on a user
-
How to configure access policies by walking through specific examples
Instructions requiring interaction with the UI assume the application is being accessed by User1, a user with administrator privileges, such as the Initial Admin Identity user or a converted legacy admin user (see Authorizers.xml Setup). |
Creating Users and Groups
From the UI, select Users from the Global Menu. This opens a dialog to create and manage users and groups.
Click the Add icon (). To create a user, enter the 'Identity' information relevant to the authentication method chosen to secure your Clockspring instance. Click OK.
To create a group, select the Group radio button, enter the name of the group and select the users to be included in the group. Click OK.
Access Policies
You can manage the ability for users and groups to view or modify resources using 'access policies'. There are two types of access policies that can be applied to a resource:
-
View — If a view policy is created for a resource, only the users or groups that are added to that policy are able to see the details of that resource.
-
Modify — If a resource has a modify policy, only the users or groups that are added to that policy can change the configuration of that resource.
You can create and apply access policies on both global and component levels.
Global Access Policies
Global access policies govern the following system level authorizations:
Policy | Privilege | Global Menu Selection | Resource Descriptor |
---|---|---|---|
view the UI |
Allows users to view the UI |
N/A |
|
access the controller |
Allows users to view/modify the controller including Reporting Tasks, Controller Services, Parameter Contexts and Nodes in the Cluster |
Controller Settings |
|
access parameter contexts |
Allows users to view/modify Parameter Contexts. Access to Parameter Contexts are inherited from the "access the controller" policies unless overridden. |
Parameter Contexts |
|
query provenance |
Allows users to submit a Provenance Search and request Event Lineage |
Data Provenance |
|
access restricted components |
Allows users to create/modify restricted components assuming other permissions are sufficient. The restricted components may indicate which specific permissions are required. Permissions can be granted for specific restrictions or be granted regardless of restrictions. If permission is granted regardless of restrictions, the user can create/modify all restricted components. |
N/A |
|
access all policies |
Allows users to view/modify the policies for all components |
Policies |
|
access users/user groups |
Allows users to view/modify the users and user groups |
Users |
|
retrieve site-to-site details |
Allows other instances to retrieve Site-To-Site details |
N/A |
|
view system diagnostics |
Allows users to view System Diagnostics |
Summary |
|
proxy user requests |
Allows proxy machines to send requests on the behalf of others |
N/A |
|
access counters |
Allows users to view/modify Counters |
Counters |
|
Component Level Access Policies
Component level access policies govern the following component level authorizations:
Policy | Privilege | Resource Descriptor & Action |
---|---|---|
view the component |
Allows users to view component configuration details |
|
modify the component |
Allows users to modify component configuration details |
|
operate the component |
Allows users to operate components by changing component run status (start/stop/enable/disable), remote port transmission status, or terminating processor threads |
|
view provenance |
Allows users to view provenance events generated by this component |
|
view the data |
Allows users to view metadata and content for this component in flowfile queues in outbound connections and through provenance events |
|
modify the data |
Allows users to empty flowfile queues in outbound connections and submit replays through provenance events |
|
view the policies |
Allows users to view the list of users who can view/modify a component |
|
modify the policies |
Allows users to modify the list of users who can view/modify a component |
|
receive data via site-to-site |
Allows a port to receive data from other instances |
|
send data via site-to-site |
Allows a port to send data to other instances |
|
You can apply access policies to all component types except connections. Connection authorizations are inferred by the individual access policies on the source and destination components of the connection, as well as the access policy of the process group containing the components. This is discussed in more detail in the Creating a Connection and Editing a Connection examples below. |
In order to access List Queue or Delete Queue for a connection, a user requires permission to the "view the data" and "modify the data" policies on the component. In a clustered environment, all nodes must be be added to these policies as well, as a user request could be replicated through any node in the cluster. |
Access Policy Inheritance
An administrator does not need to manually create policies for every component in the dataflow. To reduce the amount of time admins spend on authorization management, policies are inherited from parent resource to child resource. For example, if a user is given access to view and modify a process group, that user can also view and modify the components in the process group. Policy inheritance enables an administrator to assign policies at one time and have the policies apply throughout the entire dataflow.
You can override an inherited policy (as described in the Moving a Processor example below). Overriding a policy removes the inherited policy, breaking the chain of inheritance from parent to child, and creates a replacement policy to add users as desired. Inherited policies and their users can be restored by deleting the replacement policy.
View the policies and modify the policies component-level access policies are an exception to this inherited behavior.When a user is added to either policy, they are added to the current list of administrators.They do not override higher level administrators.For this reason, only component specific administrators are displayed for the view the policies and modify the policies" access policies. |
You cannot modify the users/groups on an inherited policy. Users and groups can only be added or removed from a parent policy or an override policy. |
Viewing Policies on Users
From the UI, select Users from the Global Menu. This opens the Users dialog.
Select the View User Policies icon ().
The User Policies window displays the global and component level policies that have been set for the chosen user. Select the Go To icon () to navigate to that component in the canvas.
Access Policy Configuration Examples
The most effective way to understand how to create and apply access policies is to walk through some common examples. The following scenarios assume User1 is an administrator and User2 is a newly added user that has only been given access to the UI.
Lets begin with two processors on the canvas as our starting point: GenerateFlowFile and LogAttribute.
User1 can add components to the dataflow and is able to move, edit and connect all processors. The details and properties of the root process group and processors are visible to User1.
User1 wants to maintain their current privileges to the dataflow and its components.
User2 is unable to add components to the dataflow or move, edit, or connect components. The details and properties of the root process group and processors are hidden from User2.
Moving a Processor
To allow User2 to move the GenerateFlowFile processor in the dataflow and only that processor, User1 performs the following steps:
-
Select the GenerateFlowFile processor so that it is highlighted.
-
Select the Access Policies icon () from the Operate palette and the Access Policies dialog opens.
-
Select modify the component from the policy drop-down. The modify the component policy that currently exists on the processor (child) is the modify the component policy inherited from the root process group (parent) on which User1 has privileges.
-
Select the Override link in the policy inheritance message. When creating the replacement policy, you are given a choice to override with a copy of the inherited policy or an empty policy. Select the Override button to create a copy.
-
On the replacement policy that is created, select the Add User icon (). Find or enter User2 in the User Identity field and select OK. With these changes, User1 maintains the ability to move both processors on the canvas. User2 can now move the GenerateFlowFile processor but cannot move the LogAttribute processor.
Editing a Processor
In the Moving a Processor example above, User2 was added to the modify the component policy for GenerateFlowFile. Without the ability to view the processor properties, User2 is unable to modify the processors configuration. In order to edit a component, a user must be on both the view the component and modify the component policies. To implement this, User1 performs the following steps:
-
Select the GenerateFlowFile processor.
-
Select the Access Policies icon () from the Operate palette and the Access Policies dialog opens.
-
Select "view the component from the policy drop-down. The view the component policy that currently exists on the processor (child) is the "view the component policy inherited from the root process group (parent) on which User1 has privileges.
-
Select the Override link in the policy inheritance message, keep the default of Copy policy and select the Override button.
-
On the override policy that is created, select the Add User icon (). Find or enter User2 in the User Identity field and select OK. With these changes, User1 maintains the ability to view and edit the processors on the canvas. User2 can now view and edit the GenerateFlowFile processor.
Creating a Connection
With the access policies configured as discussed in the previous two examples, User1 is able to connect GenerateFlowFile to LogAttribute:
User2 cannot make the connection:
This is because:
-
User2 does not have modify access on the process group.
-
Even though User2 has view and modify access to the source component (GenerateFlowFile), User2 does not have an access policy on the destination component (LogAttribute).
To allow User2 to connect GenerateFlowFile to LogAttribute, as User1:
-
Select the root process group. The Operate palette is updated with details for the root process group.
-
Select the Access Policies icon () from the Operate palette and the Access Policies dialog opens.
-
Select "modify the component from the policy drop-down.
-
Select the Add User icon (). Find or enter User2 and select OK.
By adding User2 to the modify the component policy on the process group, User2 is added to the modify the component policy on the LogAttribute processor by policy inheritance. To confirm this, highlight the LogAttribute processor and select the Access Policies icon () from the Operate palette:
With these changes, User2 can now connect the GenerateFlowFile processor to the LogAttribute processor.
Editing a Connection
Assume User1 or User2 adds a ReplaceText processor to the root process group:
User1 can select and change the existing connection (between GenerateFlowFile to LogAttribute) to now connect GenerateFlowFile to ReplaceText:
User 2 is unable to perform this action.
To allow User2 to connect GenerateFlowFile to ReplaceText, as User1:
-
Select the root process group. The Operate palette is updated with details for the root process group.
-
Select the Access Policies icon ().
-
Select "view the component from the policy drop-down.
-
Select the Add User icon (). Find or enter User2 and select OK.
Being added to both the view and modify policies for the process group, User2 can now connect the GenerateFlowFile processor to the ReplaceText processor.
Encrypted Passwords in Flows
Clockspring always stores all sensitive values (passwords, tokens, and other credentials) populated into a flow in an encrypted format on disk.
The encryption algorithm used is specified by nifi.sensitive.props.algorithm
and the password from which the encryption key is derived is specified by nifi.sensitive.props.key
in nifi.properties.
All options require a password (nifi.sensitive.props.key
value) of at least 12 characters.
Clustered installations of Clockspring require the same value to be configured on all nodes.
Encrypted Passwords in Configuration Files
In order to facilitate the secure setup of Clockspring, you can use the encrypt-config
command line utility to encrypt raw configuration values that Clockspring decrypts in memory on startup. This extensible protection scheme transparently allows Clockspring to use raw values in operation, while protecting them at rest.
If no administrator action is taken, the configuration values remain unencrypted.
Configuring each Sensitive Property Provider requires including the appropriate file reference property in bootstrap.conf
. The default bootstrap.conf
includes commented file reference properties for available providers.
HashiCorp Vault providers
Two encryption providers are currently configurable in the bootstrap-hashicorp-vault.conf
file:
Provider | Provider Identifier | Description |
---|---|---|
HashiCorp Vault Transit provider |
|
Uses HashiCorp Vault’s Transit Secrets Engine to decrypt sensitive properties. |
HashiCorp Vault Key/Value provider |
|
Retrieves sensitive values from Secrets stored in a HashiCorp Vault Key/Value (unversioned) Secrets Engine. |
Note that all HashiCorp Vault encryption providers require a running Vault instance in order to decrypt these values at startup.
Following are the configuration properties available inside the bootstrap-hashicorp-vault.conf
file:
Required properties
Property Name | Description | Default |
---|---|---|
|
The HashiCorp Vault URI (e.g., |
none |
|
Filename of a properties file containing Vault authentication properties. See the |
none |
|
If set, enables the HashiCorp Vault Transit provider. The value should be the Vault |
none |
|
If set, enables the HashiCorp Vault Key/Value provider. The value should be the Vault |
none |
Optional properties
Property Name | Description | Default |
---|---|---|
|
The Key/Value Secrets Engine version: |
|
|
The connection timeout of the Vault client |
|
|
The read timeout of the Vault client |
|
|
A comma-separated list of the enabled TLS cipher suites |
none |
|
A comma-separated list of the enabled TLS protocols |
none |
|
Path to a keystore. Required if the Vault server is TLS-enabled |
none |
|
Keystore type (JKS, BCFKS or PKCS12). Required if the Vault server is TLS-enabled |
none |
|
Keystore password. Required if the Vault server is TLS-enabled |
none |
|
Path to a truststore. Required if the Vault server is TLS-enabled |
none |
|
Truststore type (JKS, BCFKS or PKCS12). Required if the Vault server is TLS-enabled |
none |
|
Truststore password. Required if the Vault server is TLS-enabled |
none |
AWS KMS provider
This provider uses AWS Key Management Service for decryption. AWS KMS configuration properties can be stored in the bootstrap-aws.conf
file, as referenced in bootstrap.conf
. If the configuration properties are not specified in bootstrap-aws.conf
, then the provider will attempt to use the AWS default credentials provider, which checks standard environment variables and system properties.
Required properties
Property Name | Description | Default |
---|---|---|
|
The identifier or ARN that the AWS KMS client uses for encryption and decryption. |
none |
Optional properties
All of the following must be configured, or will be ignored entirely.
Property Name | Description | Default |
---|---|---|
|
The AWS region used to configure the AWS KMS Client. |
none |
|
The access key ID credential used to access AWS KMS. |
none |
|
The secret access key used to access AWS KMS. |
none |
AWS Secrets Manager provider
This provider uses AWS Secrets Manager Service to store and retrieve AWS Secrets. AWS Secrets Manager configuration properties can be stored in the bootstrap-aws.conf
file, as referenced in bootstrap.conf
. If the configuration properties are not specified in bootstrap-aws.conf
, then the provider will attempt to use the AWS default credentials provider, which checks standard environment variables and system properties.
Optional properties
All of the following must be configured, or will be ignored entirely.
Property Name | Description | Default |
---|---|---|
|
The AWS region used to configure the AWS Secrets Manager Client. |
none |
|
The access key ID credential used to access AWS Secrets Manager. |
none |
|
The secret access key used to access AWS Secrets Manager. |
none |
Azure Key Vault Key Provider
This protection scheme uses keys managed by Azure Key Vault Keys for encryption and decryption.
Azure Key Vault configuration properties can be stored in the bootstrap-azure.conf
file, as referenced in the
bootstrap.conf
of Clockspring or Registry.
The provider will use the
DefaultAzureCredential
for authentication.
The Azure Identity client library
describes the process for credentials resolution, which leverages environment variables, system properties, and falls
back to
Managed Identity
authentication.
Required properties
Property Name | Description | Default |
---|---|---|
|
The identifier of the key that the Azure Key Vault client uses for encryption and decryption. |
none |
|
The encryption algorithm that the Azure Key Vault client uses for encryption and decryption. |
none |
Azure Key Vault Secret Provider
This protection scheme uses secrets managed by Azure Key Vault Secrets for storing and retrieving protected properties.
Azure Key Vault configuration properties can be stored in the bootstrap-azure.conf
file, as referenced in the
bootstrap.conf
of Clockspring or Registry.
The provider will use the
DefaultAzureCredential
for authentication.
The Azure Identity client library
describes the process for credentials resolution, which leverages environment variables, system properties, and falls
back to
Managed Identity
authentication.
Names of secrets stored in Azure Key Vault support alphanumeric and dash characters, but do not support characters such as /
or .
.
For this reason, Clockspring replaces these characters with -
when storing and retrieving secrets. The following table provides an example property name mapping:
Property Context | Property Name | Secret Name |
---|---|---|
|
|
|
Required properties
Property Name | Description | Default |
---|---|---|
|
URI for the Azure Key Vault service such as |
none |
Google Cloud KMS provider
This protection scheme uses Google Cloud Key Management Service (Google Cloud Key Management Service) for encryption and decryption. Google Cloud KMS configuration properties are to be stored in the bootstrap-gcp.conf
file, as referenced in the bootstrap.conf
of Clockspring or Registry. Credentials must be configured as per the following documentation: Google Cloud KMS documentation
Required properties
Property Name | Description | Default |
---|---|---|
|
The project containing the key that the Google Cloud KMS client uses for encryption and decryption. |
none |
|
The geographic region of the project containing the key that the Google Cloud KMS client uses for encryption and decryption. |
none |
|
The keyring containing the key that the Google Cloud KMS client uses for encryption and decryption. |
none |
|
The key identifier that the Google Cloud KMS client uses for encryption and decryption. |
none |
Property Context Mapping
Some encryption providers store protected values in an external service instead of persisting the encrypted values directly in the configuration file. To support this use case, a property context is defined for each protected property in Clockspring’s configuration files, in the format: {context-name}/{property-name}
-
context-name
- represents a namespace for properties in order to disambiguate properties with the same name. Without additional configuration, all protected properties are assigned thedefault
context. -
property-name
- contains the name of the property.
In order to support logical context names, mapping properties may be provided in bootstrap.conf
, as follows:
nifi.bootstrap.protection.context.mapping.<context-name>=<identifier matching regex>
Here, context-name
would determine the context name above, and <identifier matching regex>
would map any property whose group identifier matched the provided Regular Expression. Group identifiers are defined per configuration file type, and are described as follows:
Configuration File | Group Identifier Description | Assigned Context |
---|---|---|
|
There is no concept of a group identifier here, since all property names should be unique. |
default |
|
The |
The mapped context name if RegEx matches the identifier, otherwise default |
|
The |
The mapped context name if RegEx matches the identifier, otherwise default |
Example
In the Clockspring binary distribution, the login-identity-providers.xml
file comes with a provider with the identifier ldap-provider
and a property called Manager Password
:
<provider>
<identifier>ldap-provider</identifier>
<class>org.apache.nifi.ldap.LdapProvider</class>
...
<property name="Manager Password"/>
...
</provider>
Similarly, the authorizers.xml
file comes with a ldap-user-group-provider
and a property also called Manager Password
:
<userGroupProvider>
<identifier>ldap-user-group-provider</identifier>
<class>org.apache.nifi.ldap.tenants.LdapUserGroupProvider</class>
...
<property name="Manager Password"/>
...
</userGroupProvider>
If the Manager Password is desired to reference the same exact property (e.g., the same Secret in the HashiCorp Vault K/V provider) but still be distinguished from any other Manager Password
property unrelated to LDAP, the following mapping could be added:
nifi.bootstrap.protection.context.mapping.ldap=ldap-.*
This would cause both of the above to be assigned a context of "ldap/Manager Password"
instead of "default/Manager Password"
.
Toolkit Administrative Tools
In addition to tls-toolkit
and encrypt-config
, the Toolkit also contains command line utilities for administrators to support maintenance in standalone and clustered environments. These utilities include:
-
CLI — The
cli
tool enables administrators to interact with Clockspring and Registry instances to automate tasks such as deploying versioned flows and managing process groups and cluster nodes. -
File Manager — The
file-manager
tool enables administrators to backup, install or restore a Clockspring installation from backup. -
Flow Analyzer — The
flow-analyzer
tool produces a report that helps administrators understand the max amount of data which can be stored in backpressure for a given flow. -
Node Manager — The
node-manager
tool enables administrators to perform status checks on nodes as well as the ability to connect, disconnect, or remove nodes from the cluster. -
Notify — The
notify
tool enables administrators to send bulletins to the UI. -
S2S — The
s2s
tool enables administrators to send data into or out of flows over site-to-site.
Clustering Configuration
This section provides a quick overview of Clustering and instructions on how to set up a basic cluster.
Zero-Leader Clustering
Each node in the cluster has an identical flow and performs the same tasks on the data, but each operates on a different set of data. The cluster automatically distributes the data throughout all the active nodes.
One of the nodes is automatically elected (via Apache ZooKeeper) as the Cluster Coordinator. All nodes in the cluster will then send heartbeat/status information to this node, and this node is responsible for disconnecting nodes that do not report any heartbeat status for some amount of time. Additionally, when a new node elects to join the cluster, the new node must first connect to the currently-elected Cluster Coordinator in order to obtain the most up-to-date flow. If the Cluster Coordinator determines that the node is allowed to join (based on its configured Firewall file), the current flow is provided to that node, and that node is able to join the cluster, assuming that the node’s copy of the flow matches the copy provided by the Cluster Coordinator. If the node’s version of the flow configuration differs from that of the Cluster Coordinator’s, the node will not join the cluster.
Why Cluster?
Users may find that a single instance on a single server is not enough to process the amount of data they have. By clustering the Clockspring servers, it’s possible to have that increased processing capability along with a single interface through which to make dataflow changes and monitor the dataflow. Clustering allows the DFM to make each change only once, and that change is then replicated to all the nodes of the cluster. Through the single interface, the DFM may also monitor the health and status of all the nodes.
Terminology
Cluster Coordinator: A Cluster Coordinator is the node in the cluster that is responsible for carrying out tasks to manage which nodes are allowed in the cluster and providing the most up-to-date flow to newly joining nodes. When a DataFlow Manager manages a dataflow in a cluster, they are able to do so through the User Interface of any node in the cluster. Any change made is then replicated to all nodes in the cluster.
Nodes: Each cluster is made up of one or more nodes. The nodes do the actual data processing.
Primary Node: Every cluster has one Primary Node. On this node, it is possible to run "Isolated Processors" (see below). ZooKeeper is used to automatically elect a Primary Node. If that node disconnects from the cluster for any reason, a new Primary Node will automatically be elected. Users can determine which node is currently elected as the Primary Node by looking at the Cluster Management page of the User Interface.
Isolated Processors: The same dataflow runs on all the nodes in the cluster. As a result, every component in the flow runs on every node. However, there may be cases when the DFM would not want every processor to run on every node. The most common case is when using a processor that communicates with an external service using a protocol that does not scale well. For example, the GetSFTP processor pulls from a remote directory. If the GetSFTP Processor runs on every node in the cluster and tries simultaneously to pull from the same remote directory, there could be race conditions. Therefore, the DFM could configure the GetSFTP on the Primary Node to run in isolation, meaning that it only runs on that node. With the proper dataflow configuration, it could pull in data and load-balance it across the rest of the nodes in the cluster.
Heartbeats: The nodes communicate their health and status to the currently elected Cluster Coordinator via "heartbeats", which let the Coordinator know they are still connected to the cluster and working properly. By default, the nodes emit heartbeats every 5 seconds, and if the Cluster Coordinator does not receive a heartbeat from a node within 40 seconds (= 5 seconds * 8), it disconnects the node due to "lack of heartbeat". The 5-second and 8 times settings are configurable in the clockspring.properties file (see the Cluster Common Properties section for more information). The reason that the Cluster Coordinator disconnects the node is because the Coordinator needs to ensure that every node in the cluster is in sync, and if a node is not heard from regularly, the Coordinator cannot be sure it is still in sync with the rest of the cluster. If, after 40 seconds, the node does send a new heartbeat, the Coordinator will automatically request that the node re-join the cluster, to include the re-validation of the node’s flow. Both the disconnection due to lack of heartbeat and the reconnection once a heartbeat is received are reported to the DFM in the User Interface.
Communication within the Cluster
As noted, the nodes communicate with the Cluster Coordinator via heartbeats. When a Cluster Coordinator is elected, it updates a well-known ZNode in Apache ZooKeeper with its connection information so that nodes understand where to send heartbeats. If one of the nodes goes down, the other nodes in the cluster will not automatically pick up the load of the missing node. It is possible for the DFM to configure the dataflow for failover contingencies; however, this is dependent on the dataflow design and does not happen automatically.
When the DFM makes changes to the dataflow, the node that receives the request to change the flow communicates those changes to all nodes and waits for each node to respond, indicating that it has made the change on its local flow.
Managing Nodes
Disconnect Nodes
A DFM may manually disconnect a node from the cluster. A node may also become disconnected for other reasons, such as due to a lack of heartbeat. The Cluster Coordinator will show a bulletin on the User Interface when a node is disconnected. The DFM will not be able to make any changes to the dataflow until the issue of the disconnected node is resolved. The DFM or the Administrator will need to troubleshoot the issue with the node and resolve it before any new changes can be made to the dataflow. However, it is worth noting that just because a node is disconnected does not mean that it is not working. This may happen for a few reasons, for example when the node is unable to communicate with the Cluster Coordinator due to network problems.
To manually disconnect a node, select the "Disconnect" icon () from the node’s row.
A disconnected node can be connected (), offloaded () or deleted ().
Not all nodes in a "Disconnected" state can be offloaded. If the node is disconnected and unreachable, the offload request can not be received by the node to start the offloading. Additionally, offloading may be interrupted or prevented due to firewall rules. |
Offload Nodes
Flowfiles that remain on a disconnected node can be rebalanced to other active nodes in the cluster via offloading. In the Cluster Management dialog, select the "Offload" icon () for a Disconnected node. This will stop all processors, terminate all processors, stop transmitting on all remote process groups and rebalance flowfiles to the other connected nodes in the cluster.
Nodes that remain in "Offloading" state due to errors encountered (out of memory, no network connection, etc.) can be reconnected to the cluster by restarting Clockspring on the node. Offloaded nodes can be either reconnected to the cluster (by selecting Connect or restarting the Clockspring service on the node) or deleted from the cluster.
Delete Nodes
There are cases where a DFM may wish to continue making changes to the flow, even though a node is not connected to the cluster. In this case, the DFM may elect to delete the node from the cluster entirely. In the Cluster Management dialog, select the "Delete" icon () for a Disconnected or Offloaded node. Once deleted, the node cannot be rejoined to the cluster until it has been restarted.
Flow Election
When a cluster first starts up the cluster must determine which of the nodes have the
"correct" version of the flow. This is done by voting on the flows that each of the nodes has. When a node
attempts to connect to a cluster, it provides a copy of its local flow and
its users, groups, and policies, to the Cluster Coordinator. If no flow
has yet been elected the "correct" flow, the node’s flow is compared to each of the other Nodes' flows. If another
Node’s flow matches this one, a vote is cast for this flow. If no other Node has reported the same flow yet, this
flow will be added to the pool of possibly elected flows with one vote. After
some amount of time has elapsed (configured by setting the nifi.cluster.flow.election.max.wait.time
property) or
some number of Nodes have cast votes (configured by setting the nifi.cluster.flow.election.max.candidates
property),
a flow is elected to be the "correct" copy of the flow.
Any node whose dataflow, users, groups, and policies conflict with those elected will backup any conflicting resources and replace the local
resources with those from the cluster. How the backup is performed depends on the configured Access Policy Provider and User Group Provider.
For file-based access policy providers, the backup will be written to the same directory as the existing file and bear the same
name but with a suffix of "." and a timestamp. For example, if the flow itself conflicts with the cluster’s flow at 12:05:03 on January 1, 2020,
the node’s flow.xml.gz
file will be copied to flow.xml.gz.2020-01-01-12-05-03
and the cluster’s flow will then be written to flow.xml.gz
.
Similarly, this will happen for the users.xml
and authorizations.xml
file. This is done so that the flow can be manually reverted if necessary
by renaming the backup file back to flow.xml.gz
, for example.
It is important to note that before inheriting the elected flow, Clockspring will first read through the FlowFile repository and any swap files to determine which queues in the dataflow currently hold data. If there exists any queue in the dataflow that contains a FlowFile, that queue must also exist in the elected dataflow. If that queue does not exist in the elected dataflow, the node will not inherit the dataflow, users, groups, and policies. Instead, Clockspring will log errors to that effect and will fail to startup. This ensures that even if the node has data stored in a connection, and the cluster’s dataflow is different, restarting the node will not result in data loss.
Election is performed according to the "popular vote" with the caveat that the winner will never be an "empty flow" unless all flows are empty. This allows an administrator to remove a node’s flow.xml.gz file and restart the node, knowing that the node’s flow will not be voted to be the "correct" flow unless no other flow is found. If there are two non-empty flows that receive the same number of votes, one of those flows will be chosen. The methodology used to determine which of those flows is undefined and may change at any time without notice.
Basic Cluster Setup
This section describes the setup for a simple three-node cluster.
For each instance, certain properties in the clockspring.properties file will need to be updated. In particular, the Web and Clustering properties should be evaluated for your situation and adjusted accordingly. All the properties are described in the System Properties section of this guide; however, in this section, we will focus on the minimum properties that must be set for a simple cluster.
For all three instances, the Cluster Common Properties can be left with the default settings. Note, however, that if you change these settings, they must be set the same on every instance in the cluster.
For each Node, the minimum properties to configure are as follows:
-
Under the Web Properties section, set the HTTPS port that you want the Node to run on. Also, consider whether you need to set the HTTPS host property. All nodes in the cluster should use the same protocol setting.
-
Under the State Management section, set the
nifi.state.management.provider.cluster
property to the identifier of the Cluster State Provider. Ensure that the Cluster State Provider has been configured in the state-management.xml file. See Configuring State Providers for more information. -
Under Cluster Node Properties, set the following:
-
nifi.cluster.is.node
- Set this to true. -
nifi.cluster.node.address
- Set this to the fully qualified hostname of the node. If left blank, it defaults tolocalhost
. -
nifi.cluster.node.protocol.port
- Set this to an open port that is higher than 1024 (anything lower requires root). -
nifi.cluster.node.protocol.max.threads
- The maximum number of threads that should be used to communicate with other nodes in the cluster. This property defaults to50
. A thread pool is used for replicating requests to all nodes. The thread pool will increase the number of active threads to the limit set by this property. It is typically recommended that this property be set to 4-8 times the number of nodes in your cluster. -
nifi.zookeeper.connect.string
- The Connect String that is needed to connect to Apache ZooKeeper. This is a comma-separated list of hostname:port pairs. For example,localhost:2181,localhost:2182,localhost:2183
. This should contain a list of all ZooKeeper instances in the ZooKeeper quorum. -
nifi.zookeeper.root.node
- The root ZNode that should be used in ZooKeeper. ZooKeeper provides a directory-like structure for storing data. Each 'directory' in this structure is referred to as a ZNode. This denotes the root ZNode, or 'directory', that should be used for storing data. The default value is/root
. This is important to set correctly, as which cluster the instance attempts to join is determined by which ZooKeeper instance it connects to and the ZooKeeper Root Node that is specified. -
nifi.cluster.flow.election.max.wait.time
- Specifies the amount of time to wait before electing a Flow as the "correct" Flow. If the number of Nodes that have voted is equal to the number specified by thenifi.cluster.flow.election.max.candidates
property, the cluster will not wait this long. The default value is5 mins
. Note that the time starts as soon as the first vote is cast. -
nifi.cluster.flow.election.max.candidates
- Specifies the number of Nodes required in the cluster to cause early election of Flows. This allows the Nodes in the cluster to avoid having to wait a long time before starting processing if we reach at least this number of nodes in the cluster.
-
Now, it is possible to start up the cluster. It does not matter which order the instances start up. Navigate to the URL for one of the nodes, and the User Interface should look similar to the following:
Cluster Firewall Configuration
Clustering supports network access restrictions using a custom firewall configuration.
The nifi.cluster.firewall.file
property can be configured with a path to a file containing hostnames, IP addresses, or
subnets of permitted nodes. The Cluster Coordinator uses the configuration to determine whether to accept or reject
heartbeats and connection requests from potential cluster members.
The configuration file format expects one entry per line and ignores lines beginning with the #
character. Clockspring uses
standard Java host name resolution to convert names to IP addresses. Java host name resolution leverages a combination
of local machine configuration and network services, such as DNS. The configuration file supports IPv4 addresses or subnet
ranges using CIDR notation. The following example cluster firewall configuration includes a combination of supported entries:
# Cluster Node Hostnames clockspring0.example.com clockspring1.example.com clockspring3.example.com # Cluster Node Addresses 192.168.0.1 192.168.0.2 192.168.0.3 # Cluster Subnet Address 192.168.0.0/29 # Address Range from 192.168.0.1 to 192.168.0.6
Troubleshooting
If you encounter issues and your cluster does not work as described, investigate the _application.log and user.log
files on the nodes. If needed, you can change the logging level to DEBUG by editing the conf/logback.xml
file. Specifically,
set the level="DEBUG"
in the following line (instead of "INFO"
):
<logger name="org.apache.nifi.web.api.config" level="INFO" additivity="false"> <appender-ref ref="USER_FILE"/> </logger>
State Management
Clockspring provides a mechanism for Processors, Reporting Tasks, Controller Services, and the framework itself to persist state. This allows a Processor, for example, to resume from the place where it left off after an instance is restarted. Additionally, it allows for a Processor to store some piece of information so that the Processor can access that information from all of the different nodes in the cluster. This allows one node to pick up where another node left off, or to coordinate across all of the nodes in a cluster.
Configuring State Providers
When a component decides to store or retrieve state, it does so by providing a "Scope" - either Node-local or Cluster-wide. The mechanism that is used to store and retrieve this state is then determined based on this Scope, as well as the configured State Providers. The clockspring.properties file contains three different properties that are relevant to configuring these State Providers.
Property |
Description |
|
The first is the property that specifies an external XML file that is used for configuring the local and/or cluster-wide State Providers. This XML file may contain configurations for multiple providers |
|
The property that provides the identifier of the local State Provider configured in this XML file |
|
Similarly, the property provides the identifier of the cluster-wide State Provider configured in this XML file. |
This XML file consists of a top-level state-management
element, which has one or more local-provider
and zero or more cluster-provider
elements. Each of these elements then contains an id
element that is used to specify the identifier that can be referenced in the
clockspring.properties file, as well as a class
element that specifies the fully-qualified class name to use in order to instantiate the State
Provider. Finally, each of these elements may have zero or more property
elements. Each property
element has an attribute, name
that is the name
of the property
that the State Provider supports. The textual content of the property element is the value of the property.
Once these State Providers have been configured in the state-management.xml file (or whatever file is configured), those Providers may be referenced by their identifiers.
By default, the Local State Provider is configured to be a WriteAheadLocalStateProvider
that persists the data to the
$CLOCKSPRING_HOME/state/local
directory. The default Cluster State Provider is configured to be a ZooKeeperStateProvider
. The default
ZooKeeper-based provider must have its Connect String
property populated before it can be used. It is also advisable, if multiple instances
will use the same ZooKeeper instance, that the value of the Root Node
property be changed. For instance, one might set the value to
/clockspring/<team name>/production
. A Connect String
takes the form of comma separated <host>:<port> tuples, such as
my-zk-server1:2181,my-zk-server2:2181,my-zk-server3:2181
. In the event a port is not specified for any of the hosts, the ZooKeeper default of
2181
is assumed.
When adding data to ZooKeeper, there are two options for Access Control: Open
and CreatorOnly
. If the Access Control
property is
set to Open
, then anyone is allowed to log into ZooKeeper and have full permissions to see, change, delete, or administer the data.
If CreatorOnly
is specified, then only the user that created the data is allowed to read, change, delete, or administer the data.
In order to use the CreatorOnly
option, Clockspring must provide some form of authentication. See the ZooKeeper Access Control
section below for more information on how to configure authentication.
If Clockspring is configured to run in a standalone mode, the cluster-provider
element need not be populated in the state-management.xml
file and will actually be ignored if they are populated. However, the local-provider
element must always be present and populated.
Additionally, if Clockspring is run in a cluster, each node must also have the cluster-provider
element present and properly configured.
Otherwise, Clockspring will fail to start.
While there are not many properties that need to be configured for these providers, they were externalized into a separate state-management.xml file, rather than being configured via the clockspring.properties file, simply because different implementations may require different properties, and it is easier to maintain and understand the configuration in an XML-based file such as this, than to mix the properties of the Provider in with all of the other Clocksrping framework-specific properties.
It should be noted that if Processors and other components save state using the Clustered scope, the Local State Provider will be used if the instance is a standalone instance (not in a cluster) or is disconnected from the cluster. This also means that if a standalone instance is migrated to become a cluster, then that state will no longer be available, as the component will begin using the Clustered State Provider instead of the Local State Provider.
ZooKeeper Access Control
ZooKeeper provides Access Control to its data via an Access Control List (ACL) mechanism. When data is written to ZooKeeper, Clockspring will provide an ACL
that indicates that any user is allowed to have full permissions to the data, or an ACL that indicates that only the user that created the data is
allowed to access the data. Which ACL is used depends on the value of the Access Control
property for the ZooKeeperStateProvider
(see the
Configuring State Providers section for more information).
In order to use an ACL that indicates that only the Creator is allowed to access the data, we need to tell ZooKeeper who the Creator is. There are three mechanisms for accomplishing this. The first mechanism is to provide authentication using Kerberos. See Kerberizing Clockspring’s ZooKeeper Client for more information.
The second option, which additionally ensures that network communication is encrypted, is to authenticate using an X.509 certificate on a TLS-enabled ZooKeeper server. See Securing ZooKeeper with TLS for more information.
The third option is to use a username and password. This is configured by specifying a value for the Username
and a value for the Password
properties
for the ZooKeeperStateProvider
(see the Configuring State Providers section for more information). The important thing to keep in mind here, though, is that ZooKeeper
will pass around the password in plain text. This means that using a username and password should not be used unless ZooKeeper is running on localhost as a
one-instance cluster, or if communications with ZooKeeper occur only over encrypted communications, such as a VPN or an SSL connection.
Securing ZooKeeper with Kerberos
When Clockspring communicates with ZooKeeper, all communications, by default, are non-secure, and anyone who logs into ZooKeeper is able to view and manipulate all of the state that is stored in ZooKeeper. To prevent this, one option is to use Kerberos to manage authentication.
In order to secure the communications with Kerberos, we need to ensure that both the client and the server support the same configuration. Instructions for configuring the ZooKeeper client and embedded ZooKeeper server to use Kerberos are provided below.
If Kerberos is not already setup in your environment, you can find information on installing and setting up a Kerberos Server at Red Hat Customer Portal: Configuring a Kerberos 5 Server. This guide assumes that Kerberos already has been installed in the environment in which Clockspring is running.
Note, the following procedures for kerberizing an Embedded ZooKeeper server in your Cluster Node and kerberizing a ZooKeeper client will require that Kerberos client libraries be installed. This is accomplished in Fedora-based Linux distributions via:
yum install krb5-workstation
Once this is complete, the /etc/krb5.conf will need to be configured appropriately for your organizations Kerberos environment.
Kerberizing Embedded ZooKeeper Server
The krb5.conf file on the systems with the embedded zookeeper servers should be identical to the one on the system where the krb5kdc service is running. When using the embedded ZooKeeper server, we may choose to secure the server by using Kerberos. All nodes configured to launch an embedded ZooKeeper and using Kerberos should follow these steps. When using the embedded ZooKeeper server, we may choose to secure the server by using Kerberos. All nodes configured to launch an embedded ZooKeeper and using Kerberos should follow these steps.
In order to use Kerberos, we first need to generate a Kerberos Principal for our ZooKeeper servers. The following command is run on the server where the krb5kdc service is running. This is accomplished via the kadmin tool:
kadmin: addprinc "zookeeper/myHost.example.com@EXAMPLE.COM"
Here, we are creating a Principal with the primary zookeeper/myHost.example.com
, using the realm EXAMPLE.COM
. We need to use a Principal whose
name is <service name>/<instance name>
. In this case, the service is zookeeper
and the instance name is myHost.example.com
(the fully qualified name of our host).
Next, we will need to create a KeyTab for this Principal, this command is run on the server with the Clockspring instance with an embedded zookeeper server:
kadmin: xst -k zookeeper-server.keytab zookeeper/myHost.example.com@EXAMPLE.COM
This will create a file in the current directory named zookeeper-server.keytab
. We can now copy that file into the $CLOCKSPRING_HOME/conf/
directory. We should ensure
that only the user that will be running Clockspring is allowed to read this file.
We will need to repeat the above steps for each of the instances of Clockspring that will be running the embedded ZooKeeper server, being sure to replace myHost.example.com
with
myHost2.example.com
, or whatever fully qualified hostname the ZooKeeper server will be run on.
Now that we have our KeyTab for each of the servers that will be running Clockspring, we will need to configure Clocksprings embedded ZooKeeper server to use this configuration.
ZooKeeper uses the Java Authentication and Authorization Service (JAAS), so we need to create a JAAS-compatible file In the $CLOCKSPRING_HOME/conf/
directory, create a file
named zookeeper-jaas.conf (this file will already exist if the Client has already been configured to authenticate via Kerberos. Thats okay, just add to the file).
We will add to this file, the following snippet:
Server {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="./conf/zookeeper-server.keytab"
storeKey=true
useTicketCache=false
principal="zookeeper/myHost.example.com@EXAMPLE.COM";
};
Be sure to replace the value of principal
above with the appropriate Principal, including the fully qualified domain name of the server.
Next, we need to tell Clockspring to use this as our JAAS configuration. This is done by setting a JVM System Property, so we will edit the conf/bootstrap.conf file. If the Client has already been configured to use Kerberos, this is not necessary, as it was done above. Otherwise, we will add the following line to our bootstrap.conf file:
java.arg.15=-Djava.security.auth.login.config=./conf/zookeeper-jaas.conf
This additional line in the file doesnt have to be number 15, it just has to be added to the bootstrap.conf file. Use whatever number is appropriate for your configuration. |
We will want to initialize our Kerberos ticket by running the following command:
kinit kt zookeeper-server.keytab "zookeeper/myHost.example.com@EXAMPLE.COM"
Again, be sure to replace the Principal with the appropriate value, including your realm and your fully qualified hostname.
Finally, we need to tell the Kerberos server to use the SASL Authentication Provider. To do this, we edit the $CLOCKSPRING_HOME/conf/zookeeper.properties file and add the following lines:
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
kerberos.removeHostFromPrincipal=true
kerberos.removeRealmFromPrincipal=true
jaasLoginRenew=3600000
requireClientAuthScheme=sasl
The kerberos.removeHostFromPrincipal
and the kerberos.removeRealmFromPrincipal
properties are used to normalize the user principal name before comparing an identity to acls
applied on a Znode. By default the full principal is used however setting the kerberos.removeHostFromPrincipal
and the kerberos.removeRealmFromPrincipal
properties to true will instruct
ZooKeeper to remove the host and the realm from the logged in user’s identity for comparison. In cases where nodes within the same cluster use principals that
have different host(s)/realm(s) values, these kerberos properties can be configured to ensure that the nodes' identity will be normalized and that the nodes will have
appropriate access to shared Znodes in ZooKeeper.
The last line is optional but specifies that clients MUST use Kerberos to communicate with our ZooKeeper instance.
Now, we can start Clockspring, and the embedded ZooKeeper server will use Kerberos as the authentication mechanism.
Kerberizing Clockspring’s ZooKeeper Client
The cluster nodes running the embedded zookeeper server will also need to follow the below procedure since they will also be acting as a client at the same time. |
The preferred mechanism for authenticating users with ZooKeeper is to use Kerberos. In order to use Kerberos to authenticate, we must configure a few
system properties, so that the ZooKeeper client knows who the user is and where the KeyTab file is. All nodes configured to store cluster-wide state
using ZooKeeperStateProvider
and using Kerberos should follow these steps.
First, we must create the Principal that we will use when communicating with ZooKeeper. This is generally done via the kadmin
tool:
kadmin: addprinc "clocksrping@EXAMPLE.COM"
A Kerberos Principal is made up of three parts: the primary, the instance, and the realm. Here, we are creating a Principal with the primary clockspring
,
no instance, and the realm EXAMPLE.COM
. The primary (clockspring
, in this case) is the identifier that will be used to identify the user when authenticating
via Kerberos.
After we have created our Principal, we will need to create a KeyTab for the Principal:
kadmin: xst -k clockspring.keytab clocksrping@EXAMPLE.COM
This keytab file can be copied to the other nodes with embedded zookeeper servers.
This will create a file in the current directory named clockspring.keytab. We can now copy that file into the $CLOCKSPRING_HOME/conf/
directory. We should ensure
that only the user that will be running Clockspring is allowed to read this file.
Next, we need to configure Clockspring to use this KeyTab for authentication. Since ZooKeeper uses the Java Authentication and Authorization Service (JAAS), we need to
create a JAAS-compatible file. In the $CLOCKSPRING_HOME/conf/
directory, create a file named zookeeper-jaas.conf and add to it the following snippet:
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="./conf/clockspring.keytab"
storeKey=true
useTicketCache=false
principal="clocksrping@EXAMPLE.COM";
};
We then need to tell Clockspring to use this as our JAAS configuration. This is done by setting a JVM System Property, so we will edit the conf/bootstrap.conf file. We add the following line anywhere in this file in order to tell the Clockspring JVM to use this configuration:
java.arg.15=-Djava.security.auth.login.config=./conf/zookeeper-jaas.conf
Finally we need to update clockspring.properties to ensure that Clockspring knows to apply SASL specific ACLs for the Znodes it will create in ZooKeeper for cluster management. To enable this, in the $CLOCKSPRING_HOME/conf/clockspring.properties file and edit the following properties as shown below:
nifi.zookeeper.auth.type=sasl
nifi.zookeeper.kerberos.removeHostFromPrincipal=true
nifi.zookeeper.kerberos.removeRealmFromPrincipal=true
The kerberos.removeHostFromPrincipal and kerberos.removeRealmFromPrincipal should be consistent with what is set in ZooKeeper configuration.
|
We can initialize our Kerberos ticket by running the following command:
kinit -kt clockspring.keytab clocksrping@EXAMPLE.COM
Now, when we start Clockspring it will use Kerberos to authentication as the clockspring
user when communicating with ZooKeeper.
Troubleshooting Kerberos Configuration
When using Kerberos, it is import to use fully-qualified domain names and not use localhost. Please ensure that the fully qualified hostname of each server is used in the following locations:
-
conf/zookeeper.properties file should use FQDN for
server.1
,server.2
, …,server.N
values. -
The
Connect String
property of the ZooKeeperStateProvider -
The /etc/hosts file should also resolve the FQDN to an IP address that is not
127.0.0.1
.
Failure to do so, may result in errors similar to the following:
2016-01-08 16:08:57,888 ERROR [pool-26-thread-1-SendThread(localhost:2181)] o.a.zookeeper.client.ZooKeeperSaslClient An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating ZooKeeper Quorum Member's received SASL token. ZooKeeper Client will go to AUTH_FAILED state.
If there are problems communicating or authenticating with Kerberos, this Troubleshooting Guide may be of value.
One of the most important notes in the above Troubleshooting guide is the mechanism for turning on Debug output for Kerberos.
This is done by setting the sun.security.krb5.debug
environment variable.
In Clockspring this is accomplished by adding the following line to the $CLOCKSPRING_HOME/conf/bootstrap.conf file:
java.arg.16=-Dsun.security.krb5.debug=true
This will cause the debug output to be written to the Bootstrap log file. By default, this is located at $CLOCKSPRING_HOME/logs/bootstrap.log. This output can be rather verbose but provides extremely valuable information for troubleshooting Kerberos failures.
Securing ZooKeeper with TLS
As discussed above, communications with ZooKeeper are insecure by default. The second option for securely authenticating to and communicating with ZooKeeper is to use certificate-based authentication with a TLS-enabled ZooKeeper server (available since ZooKeeper’s 3.5.x releases). Instructions for enabling TLS on an external ZooKeeper ensemble can be found in the ZooKeeper Administrator’s Guide.
Once you have a TLS-enabled instance of ZooKeeper, TLS can be enabled for the client by setting nifi.zookeeper.client.secure=true
. By default, the ZooKeeper client will use the existing nifi.security.*
properties for the keystore and truststore. If you require separate TLS configuration for ZooKeeper, you can create a separate keystore and truststore and configure the following properties
in the $CLOCKSPRING_HOME/conf/clockspring.properties file:
Property Name | Description | Default |
---|---|---|
|
Whether to acccess ZooKeeper using client TLS. |
false |
|
Filename of the Keystore containing the private key to use when communicating with ZooKeeper. |
none |
|
Optional. The type of the Keystore. Must be |
none |
|
The password for the Keystore. |
none |
|
Filename of the Truststore that will be used to verify the ZooKeeper server(s). |
none |
|
Optional. The type of the Truststore. Must be |
none |
|
The password for the Truststore. |
none |
Whether using the default security properties or the ZooKeeper specific properties, the keystore and truststores must contain the appropriate keys and certificates for use with ZooKeeper (i.e., the keys and certificates need to align with the ZooKeeper configuration either way).
After updating the above properties and starting Clockspring, network communication with ZooKeeper will be secure and ZooKeeper will now use the Clockspring node’s certificate principal when authenticating access. This will be reflected in log messages like the following on the ZooKeeper server:
2020-02-24 23:37:52,671 [myid:2] - INFO [nioEventLoopGroup-4-1:X509AuthenticationProvider@172] - Authenticated Id 'CN=clockspring-node1,OU=NIFI' for Scheme 'x509'
ZooKeeper uses Netty to support network encryption and certificate-based authentication. When TLS is enabled, both the ZooKeeper server and its clients must be configured to use Netty-based
connections instead of the default NIO implementations. This is configured automatically for Clockspring when nifi.zookeeper.client.secure
is set to
true. Once Netty is enabled, you should see log messages like the following in $CLOCKSPRING_HOME/logs/_application.log:
2020-02-24 23:37:54,082 INFO [nioEventLoopGroup-3-1] o.apache.zookeeper.ClientCnxnSocketNetty SSL handler added for channel: [id: 0xa831f9c3]
2020-02-24 23:37:54,104 INFO [nioEventLoopGroup-3-1] o.apache.zookeeper.ClientCnxnSocketNetty channel is connected: [id: 0xa831f9c3, L:/172.17.0.4:56510 - R:8e38869cd1d1/172.17.0.3:2281]
Bootstrap Properties
The bootstrap.conf file in the conf
directory allows users to configure settings for how Clockspring should be started.
This includes parameters, such as the size of the Java Heap, what Java command to run, and Java System Properties.
Here, we will address the different properties that are made available in the file. Any changes to this file will take effect only after Clockspring has been stopped and restarted.
Property |
Description |
|
Specifies the fully qualified java command to run. By default, it is simply |
|
The username to run Clockspring as. For instance, if Clockspring should be run as the |
|
Whether or not to preserve shell environment while using |
|
The lib directory to use. By default, this is set to |
|
The conf directory to use. By default, this is set to |
|
When Clocksrping is instructed to shutdown, the Bootstrap will wait this number of seconds for the process to shutdown cleanly. At this amount of time,
if the service is still running, the Bootstrap will |
|
Any number of JVM arguments can be passed to the JVM when the process is started. These arguments are defined by adding properties to bootstrap.conf that
begin with |
|
The root key (in hexadecimal format) for encrypted sensitive configuration values. When Clockspring is started, this root key is used to decrypt sensitive values from the clockspring.properties file into memory for later use. The Encrypt-Config Tool can be used to specify the root key, encrypt sensitive values in clockspring.properties and update bootstrap.conf. |
|
When Clockspring is started, or stopped, or when the Bootstrap detects that Clockspring has died, the Bootstrap is able to send notifications of these events to interested parties. This is configured by specifying an XML file that defines which notification services can be used. More about this file can be found in the Notification Services section. |
|
If a notification service is configured but is unable to perform its function, it will try again up to a maximum number of attempts. This property
configures what that maximum number of attempts is. The default value is |
|
This property is a comma-separated list of Notification Service identifiers that correspond to the Notification Services
defined in the |
|
This property is a comma-separated list of Notification Service identifiers that correspond to the Notification Services
defined in the |
|
This property is a comma-separated list of Notification Service identifiers that correspond to the Notification Services
defined in the |
|
(true or false) This property decides whether to run Clockspring diagnostics before shutting down. |
|
(true or false) This property decides whether to run Clockspring diagnostics in verbose mode. |
|
This property specifies the location of the Clockspring diagnostics directory. |
|
This property specifies the maximum permitted number of diagnostic files. If the limit is exceeded, the oldest files are deleted. |
|
This property specifies the maximum permitted size of the diagnostics directory. If the limit is exceeded, the oldest files are deleted. |
Notification Services
When Bootstrap starts, stops, or detects that it has died unexpectedly, it is able to notify configured recipients. Currently, the only mechanisms supplied are to send an e-mail or HTTP POST notification. The notification services configuration file is an XML file where the notification capabilities are configured.
The default location of the XML file is conf/bootstrap-notification-services.xml, but this value can be changed in the conf/bootstrap.conf file.
The syntax of the XML file is as follows:
<services> <!-- any number of service elements can be defined. --> <service> <id>some-identifier</id> <!-- The fully-qualified class name of the Notification Service. --> <class>org.apache.nifi.bootstrap.notification.email.EmailNotificationService</class> <!-- Any number of properties can be set using this syntax. The properties available depend on the Notification Service. --> <property name="Property Name 1">Property Value</property> <property name="Another Property Name">Property Value 2</property> </service> </services>
Once the desired services have been configured, they can then be referenced in the bootstrap.conf file.
Email Notification Service
The first Notifier is to send emails and the implementation is org.apache.nifi.bootstrap.notification.email.EmailNotificationService
.
It has the following properties available:
Property |
Required |
Description |
|
true |
The hostname of the SMTP Server that is used to send Email Notifications |
|
true |
The Port used for SMTP communications |
|
true |
Username for the SMTP account |
|
Password for the SMTP account |
|
|
Flag indicating whether authentication should be used |
|
|
Flag indicating whether TLS should be enabled |
|
|
|
|
|
X-Mailer used in the header of the outgoing email |
|
|
Mime Type used to interpret the contents of the email, such as |
|
|
true |
Specifies the Email address to use as the sender. Otherwise, a "friendly name" can be used as the From address, but the value must be enclosed in double-quotes. |
|
The recipients to include in the To-Line of the email |
|
|
The recipients to include in the CC-Line of the email |
|
|
The recipients to include in the BCC-Line of the email |
In addition to the properties above that are marked as required, at least one of the To
, CC
, or BCC
properties
must be set.
A complete example of configuring the Email service would look like the following:
<service> <id>email-notification</id> <class>org.apache.nifi.bootstrap.notification.email.EmailNotificationService</class> <property name="SMTP Hostname">smtp.gmail.com</property> <property name="SMTP Port">587</property> <property name="SMTP Username">username@gmail.com</property> <property name="SMTP Password">super-secret-password</property> <property name="SMTP TLS">true</property> <property name="From">"Clockspring Service Notifier"</property> <property name="To">username@gmail.com</property> </service>
HTTP Notification Service
The second Notifier is to send HTTP POST requests and the implementation is org.apache.nifi.bootstrap.notification.http.HttpNotificationService
.
It has the following properties available:
Property |
Required |
Description |
|
true |
The URL to send the notification to. Expression language is supported. |
|
Max wait time for connection to remote service. Expression language is supported. This defaults to |
|
|
Max wait time for remote service to read the request sent. Expression language is supported. This defaults to |
|
|
The fully-qualified filename of the Truststore |
|
|
The Type of the Truststore. Either |
|
|
The password for the Truststore |
|
|
The fully-qualified filename of the Keystore |
|
|
The Type of the Keystore. Either |
|
|
The password for the Keystore |
|
|
The password for the key. If this is not specified, but the Keystore Filename, Password, and Type are specified, then the Key Password will be assumed to be the same as the Keystore Password. |
|
|
The algorithm to use for this SSL context. This can either be |
In addition to the properties above, dynamic properties can be added. They will be added as headers to the HTTP request. Expression language is supported.
The notification message is in the body of the POST request. The type of notification is in the header "notification.type" and the subject uses the header "notification.subject".
A complete example of configuring the HTTP service could look like the following:
<service> <id>http-notification</id> <class>org.apache.nifi.bootstrap.notification.http.HttpNotificationService</class> <property name="URL">https://testServer.com:8080/</property> <property name="Truststore Filename">localhost-ts.jks</property> <property name="Truststore Type">JKS</property> <property name="Truststore Password">localtest<property> <property name="Keystore Filename">localhost-ts.jks</property> <property name="Keystore Type">JKS</property> <property name="Keystore Password">localtest</property> <property name="notification.timestamp">${now()}</property> </service>
System Properties
The clockspring.properties file in the conf
directory is the main configuration file for controlling how Clockspring runs. This section provides an overview of the properties in this file and their setting options.
Values for periods of time and data sizes must include the unit of measure, for example "10 secs" or "10 MB", not simply "10". |
After making changes to clockspring.properties, restart Clockspring in order for the changes to take effect. |
Upgrade Recommendations
The contents of the clockspring.properties file are relatively stable but can change from version to version. It is always a good idea to review this file when upgrading and pay attention to any changes.
Consider configuring items below marked with an asterisk (*
) in such a way that upgrading will be easier. For example, change the default directory configurations to locations outside the main root installation. In this way, these items can remain in their configured location through an upgrade, allowing Clockspring to find all the repositories and configuration files and pick up where it left off as soon as the old version is stopped and the new version is started. Furthermore, the administrator may reuse this clockspring.properties file and any other configuration files without having to re-configure them each time an upgrade takes place. See Upgrading Clockspring for more details.
Core Properties
The first section of the clockspring.properties file is for the Core Properties. These properties apply to the core framework as a whole.
Property |
Description |
|
The location of the flow configuration file (i.e., the file that contains what is currently displayed on the Clockspring graph). The default value is |
|
Specifies whether Clockspring creates a backup copy of the flow automatically when the flow is updated. The default value is |
|
The location of the archive directory where backup copies of the flow.xml are saved. The default value is |
|
The lifespan of archived flow.xml files. Clockspring will delete expired archive files when it updates flow.xml if this property is specified. Expiration is determined based on current system time and the last modified timestamp of an archived flow.xml. If no archive limitation is specified in clockspring.properties, Clockspring removes archives older than |
|
The total data size allowed for the archived flow.xml files. Clockspring will delete the oldest archive files until the total archived file size becomes less than this configuration value, if this property is specified. If no archive limitation is specified in clockspring.properties, Clockspring uses |
|
The number of archive files allowed. Clockspring will delete the oldest archive files so that only N latest archives can be kept, if this property is specified. |
|
Indicates whether -upon restart- the components on the Clockspring graph should return to their last state. The default value is |
|
Indicates the shutdown period. The default value is |
|
When many changes are made to the flow.xml, this property specifies how long to wait before writing out the changes, so as to batch the changes into a single write. The default value is |
|
If a component allows an unexpected exception to escape, it is considered a bug. As a result, the framework will pause (or administratively yield) the component for this amount of time. This is done so that the component does not use up massive amounts of system resources, since it is known to have problems in the existing state. The default value is |
|
When a component has no work to do (i.e., is "bored"), this is the amount of time it will wait before checking to see if it has new data to work on. This way, it does not use up CPU resources by checking for new work too often. When setting this property, be aware that it could add extra latency for components that do not constantly have work to do, as once they go into this "bored" state, they will wait this amount of time before checking for more work. The default value is |
|
When drawing a new connection between two components, this is the default value for that connection’s back pressure object threshold. The default is |
|
When drawing a new connection between two components, this is the default value for that connection’s back pressure data size threshold. The default is |
|
This is the location of the file that specifies how authorizers are defined. The default value is |
|
This is the location of the file that specifies how username/password authentication is performed. This file is
only considered if |
|
This is the location of the directory where flow templates are saved (for backward compatibility only). Templates are stored in the flow.xml.gz. The template directory can be used to (bulk) import templates into the flow.xml.gz automatically on Clockspring startup. The default value is |
|
This is banner text that may be configured to display at the top of the User Interface. It is blank by default. |
|
The interval at which the User Interface auto-refreshes. The default value is |
|
The location of the nar library. The default value is |
|
The location of the nar working directory. The default value is |
|
The documentation working directory. The default value is |
|
If set to |
|
Time to wait for a Processor’s life-cycle operation ( |
State Management
The State Management section of the Properties file provides a mechanism for configuring local and cluster-wide mechanisms for components to persist state. See the State Management section for more information on how this is used.
Property |
Description |
|
The XML file that contains configuration for the local and cluster-wide State Providers. The default value is |
|
The ID of the Local State Provider to use. This value must match the value of the |
|
The ID of the Cluster State Provider to use. This value must match the value of the |
|
Specifies whether or not this instance of Clockspring should start an embedded ZooKeeper Server. This is used in conjunction with the ZooKeeperStateProvider. |
|
Specifies a properties file that contains the configuration for the embedded ZooKeeper Server that is started (if the |
H2 Settings
The H2 Settings section defines the settings for the H2 database, which keeps track of user access and flow controller history.
Property |
Description |
|
The location of the H2 database directory. The default value is |
|
This property specifies additional arguments to add to the connection string for the H2 database. The default value should be used and should not be changed. It is: |
FlowFile Repository
The FlowFile repository keeps track of the attributes and current state of each FlowFile in the system. By default, this repository is installed in the same root installation directory as all the other repositories; however, it is advisable to configure it on a separate drive if available.
There are currently three implementations of the FlowFile Repository, which are detailed below.
Property |
Description |
|
The FlowFile Repository implementation. The default value is |
Switching repository implementations should only be done on an instance with zero queued FlowFiles, and should only be done with caution. |
Write Ahead FlowFile Repository
WriteAheadFlowFileRepository
is the default implementation. It persists FlowFiles to disk, and can optionally be configured to synchronize all changes to disk. This is very expensive and can significantly reduce Clockspring performance. However, if it is false
, there could be the potential for data loss if either there is a sudden power loss or the operating system crashes. The default value is false
.
Property |
Description |
|
If the repository implementation is configured to use the |
|
The location of the FlowFile Repository. The default value is |
|
The FlowFile Repository checkpoint interval. The default value is |
|
If set to |
Encrypted Write Ahead FlowFile Repository Properties
All of the properties defined above (see Write Ahead FlowFile Repository) still apply. Only encryption-specific properties are listed here. See Encrypted FlowFile Repository in the User Guide for more information.
Unlike the encrypted content and provenance repositories, the repository implementation does not change here, only the underlying write-ahead log implementation. This allows for cleaner separation and more flexibility in implementation selection. The property that should be changed to enable encryption is nifi.flowfile.repository.wal.implementation .
|
Property |
Description |
|
This is the fully-qualified class name of the key provider. A key provider is the datastore interface for accessing the encryption key to protect the content claims. There are currently three implementations: |
|
The path to the key definition resource (empty for |
|
The password used for decrypting the key definition resource, such as the keystore for |
|
The active key ID to use for encryption (e.g. |
|
The key to use for |
|
Allows for additional keys to be specified for the |
The simplest configuration is below:
nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.EncryptedSequentialAccessWriteAheadLog nifi.flowfile.repository.encryption.key.provider.implementation=org.apache.nifi.security.kms.StaticKeyProvider nifi.flowfile.repository.encryption.key.provider.location= nifi.flowfile.repository.encryption.key.id=Key1 nifi.flowfile.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
Swap Management
Clockspring keeps FlowFile information in memory (the JVM) but during surges of incoming data, the FlowFile information can start to take up so much of the JVM that system performance suffers. To counteract this effect, Clockspring "swaps" the FlowFile information to disk temporarily until more JVM space becomes available again. These properties govern how that process occurs.
Property |
Description |
|
The Swap Manager implementation. The default value is |
|
The queue threshold at which Clockspring starts to swap FlowFile information to disk. The default value is |
Content Repository
The Content Repository holds the content for all the FlowFiles in the system. By default, it is installed in the same root installation directory as all the other repositories; however, administrators will likely want to configure it on a separate drive if available. If nothing else, it is best if the Content Repository is not on the same drive as the FlowFile Repository. In dataflows that handle a large amount of data, the Content Repository could fill up a disk and the FlowFile Repository, if also on that disk, could become corrupt. To avoid this situation, configure these repositories on different drives.
Property |
Description |
|
The Content Repository implementation. The default value is |
File System Content Repository Properties
Property |
Description |
|
The Content Repository implementation. The default value is |
|
When Clockspring processes many small FlowFiles, the contents of those FlowFiles are stored in the content repository, but we do not store the content of each
individual FlowFile as a separate file in the content repository. Doing so would be very detrimental to performance, if each 120 byte FlowFile, for instance, was written to its own file. Instead,
we continue writing to the same file until it reaches some threshold. This property configures that threshold. Setting the value too small can result in poor performance due to reading from and
writing to too many files. However, a file can only be deleted from the content repository once there are no longer any FlowFiles pointing to it. Therefore, setting the value too large can result
in data remaining in the content repository for much longer, potentially leading to the content repository running out of disk space. The default value is |
|
The location of the Content Repository. The default value is |
|
If archiving is enabled (see |
|
If archiving is enabled (see |
|
To enable content archiving, set this to |
|
If set to |
|
The URL for a web-based content viewer if one is available. It is blank by default. |
Encrypted File System Content Repository Properties
All of the properties defined above (see File System Content Repository Properties) still apply. Only encryption-specific properties are listed here. See Encrypted Content Repository in the User Guide for more information.
Property |
Description |
|
This is the fully-qualified class name of the key provider. A key provider is the datastore interface for accessing the encryption key to protect the content claims. There are currently three implementations: |
|
The path to the key definition resource (empty for |
|
The password used for decrypting the key definition resource, such as the keystore for |
|
The active key ID to use for encryption (e.g. |
|
The key to use for |
|
Allows for additional keys to be specified for the |
Provenance Repository
The Provenance Repository contains the information related to Data Provenance. The next four sections are for Provenance Repository properties.
Property |
Description |
|
The Provenance Repository implementation. The default value is A third and fourth option are available: The NOTE: The |
Write Ahead Provenance Repository Properties
Property |
Description |
|
The location of the Provenance Repository. The default value is |
|
The maximum amount of time to keep data provenance information. The default value is |
|
The maximum amount of data provenance information to store at a time.
The default value is |
|
The amount of data to write to a single "event file." The default value is |
|
The number of threads to use for Provenance Repository queries. The default value is |
|
The number of threads to use for indexing Provenance events so that they are searchable. The default value is |
|
Indicates whether to compress the provenance information when an "event file" is rolled over. The default value is |
|
If set to |
|
This is a comma-separated list of the fields that should be indexed and made searchable.
Fields that are not indexed will not be searchable. Valid fields are: |
|
This is a comma-separated list of FlowFile Attributes that should be indexed and made searchable. It is blank by default.
But some good examples to consider are |
|
The repository uses Apache Lucene to performing indexing and searching capabilities. This value indicates how large a Lucene Index should
become before the Repository starts writing to a new Index. Large values for the shard size will result in more Java heap usage when searching the Provenance Repository but should
provide better performance. The default value is NOTE: This value should be smaller than (no more than half of) the |
|
Indicates the maximum length that a FlowFile attribute can be when retrieving a Provenance Event from the repository.
If the length of any attribute exceeds this value, it will be truncated when the event is retrieved. The default value is |
|
Apache Lucene creates several "segments" in an Index. These segments are periodically merged together in order to provide faster
querying. This property specifies the maximum number of threads that are allowed to be used for each of the storage directories. The default value is |
|
Each time that a Provenance query is run, the query must first search the Apache Lucene indices (at least, in most cases - there are some queries that are run often and the results are cached to avoid searching the Lucene indices). When a Lucene index is opened for the first time, it can be very expensive and take several seconds. This is compounded by having many different indices, and can result in a Provenance query taking much longer. After the index has been opened, the Operating System’s disk cache will typically hold onto enough data to make re-opening the index much faster - at least for a period of time, until the disk cache evicts this data. If this value is set, Clockspring will periodically open each Lucene index and then close it, in order to "warm" the cache. This will result in far faster queries when the Provenance Repository is large. As with all great things, though, it comes with a cost. Warming the cache does take some CPU resources, but more importantly it will evict other data from the Operating System disk cache and will result in reading (potentially a great deal of) data from the disk. This can result in lower Clockspring performance. However, if Clockspring is running in an environment where CPU and disk are not fully utilized, this feature can result in far faster Provenance queries. The default value for this property is blank (i.e. disabled). |
Encrypted Write Ahead Provenance Repository Properties
All of the properties defined above (see Write Ahead Repository Properties) still apply. Only encryption-specific properties are listed here. See Encrypted Provenance Repository in the User Guide for more information.
Property |
Description |
|
This is the fully-qualified class name of the key provider. A key provider is the datastore interface for accessing the encryption key to protect the provenance events. There are currently three implementations: |
|
The path to the key definition resource (empty for |
|
The password used for decrypting the key definition resource, such as the keystore for |
|
The active key ID to use for encryption (e.g. |
|
The key to use for |
|
Allows for additional keys to be specified for the |
The simplest configuration is below:
nifi.provenance.repository.implementation=org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository nifi.provenance.repository.encryption.key.provider.implementation=org.apache.nifi.security.kms.StaticKeyProvider nifi.provenance.repository.encryption.key.provider.location= nifi.provenance.repository.encryption.key.id=Key1 nifi.provenance.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
Persistent Provenance Repository Properties
Property |
Description |
|
The location of the Provenance Repository. The default value is |
|
The maximum amount of time to keep data provenance information. The default value is |
|
The maximum amount of data provenance information to store at a time. The default value is |
|
The amount of time to wait before rolling over the latest data provenance information so that it is available in the User Interface. The default value is |
|
The amount of information to roll over at a time. The default value is |
|
The number of threads to use for Provenance Repository queries. The default value is |
|
The number of threads to use for indexing Provenance events so that they are searchable. The default value is |
|
Indicates whether to compress the provenance information when rolling it over. The default value is |
|
If set to |
|
The number of journal files that should be used to serialize Provenance Event data. Increasing this value will allow more tasks to simultaneously update the repository but will result in more expensive merging of the journal files later. This value should ideally be equal to the number of threads that are expected to update the repository simultaneously, but 16 tends to work well in must environments. The default value is |
|
This is a comma-separated list of the fields that should be indexed and made searchable. Fields that are not indexed will not be searchable. Valid fields are: |
|
This is a comma-separated list of FlowFile Attributes that should be indexed and made searchable. It is blank by default. But some good examples to consider are |
|
Large values for the shard size will result in more Java heap usage when searching the Provenance Repository but should provide better performance. The default value is |
|
Indicates the maximum length that a FlowFile attribute can be when retrieving a Provenance Event from the repository. If the length of any attribute exceeds this value, it will be truncated when the event is retrieved. The default value is |
Volatile Provenance Repository Properties
Property |
Description |
|
The Provenance Repository buffer size. The default value is |
Status History Repository
The Status History Repository contains the information for the Component Status History and the Node Status History tools in the User Interface. The following properties govern how these tools work.
Property |
Description |
|
The Status History Repository implementation. The default value is |
|
This value indicates how often to capture a snapshot of the components' status history. The default value is |
In memory repository
If the value of the property nifi.components.status.repository.implementation
is VolatileComponentStatusRepository
, the
status history data will be stored in memory. If the application stops, all gathered information will be lost.
The buffer.size
and snapshot.frequency
work together to determine the amount of historical data to retain. As an example, to
configure two days' worth of historical data with a data point snapshot occurring every 5 minutes you would configure
snapshot.frequency
to be "5 mins" and the buffer.size to be "576". To further explain this example, for every 60 minutes there
are 12 (60 / 5) snapshot windows for that time period. To keep that data for 48 hours (12 * 48) you end up with a buffer size
of 576.
Property |
Description |
|
Specifies the buffer size for the Status History Repository. The default value is |
Persistent repository
If the value of the property nifi.components.status.repository.implementation
is EmbeddedQuestDbStatusHistoryRepository
, the
status history data will be stored to the disk in a persistent manner. Data will be kept between restarts.
Property |
Description |
|
The number of days the node status data (such as Repository disk space free, garbage collection information, etc.) will be kept. The default values
is |
|
The number of days the component status data (i.e., stats for each Processor, Connection, etc.) will be kept. The default value is |
|
The location of the persistent Status History Repository. The default value is |
Site to Site Properties
These properties govern how this instance of Clockspring communicates with remote instances of Clockspring when Remote Process Groups are configured in the dataflow.
Remote Process Groups can choose transport protocol from RAW and HTTP. Properties named with nifi.remote.input.socket.*
are RAW transport protocol specific. Similarly, nifi.remote.input.http.*
are HTTP transport protocol specific properties.
Property |
Description |
|
The host name that will be given out to clients to connect to this Clockspring instance for Site-to-Site communication. By default, it is the value from |
|
This indicates whether communication between this instance of Clockspring and remote Clockspring instances should be secure. By default, it is set to |
|
The remote input socket port for Site-to-Site communication. By default, it is blank, but it must have a value in order to use RAW socket as transport protocol for Site-to-Site. |
|
Specifies whether HTTP Site-to-Site should be enabled on this host. By default, it is set to |
|
Specifies how long a transaction can stay alive on the server. By default, it is set to |
|
Specifies how long Clockspring should cache information about a remote Clockspring instance when communicating via Site-to-Site. By default, Clockspring will cache the |
Web Properties
These properties pertain to the web-based User Interface.
Property |
Description |
|
The HTTP host. The default value is blank. |
|
The HTTP port. The default value is blank. |
|
The port which forwards incoming HTTP requests to |
|
The name of the network interface to which Clockspring should bind for HTTP requests. It is blank by default. |
|
The HTTPS host. The default value is |
|
The HTTPS port. The default value is |
|
Same as |
|
Cipher suites used to initialize the SSLContext of the Jetty HTTPS port. If unspecified, the runtime SSLContext defaults are used. |
|
Cipher suites that may not be used by an SSL client to establish a connection to Jetty. If unspecified, the runtime SSLContext defaults are used. In Chrome, the SSL cipher negotiated with Jetty may be examined in the 'Developer Tools' plugin, in the 'Security' tab. In Firefox, the SSL cipher negotiated with Jetty may be examined in the 'Secure Connection' widget found to the left of the URL in the browser address bar. |
|
The name of the network interface to which Clockspring should bind for HTTPS requests. It is blank by default. |
|
The space-separated list of application protocols supported when running with HTTPS enabled. The default value is The value can be set to The value can be set to |
|
The location of the Jetty working directory. The default value is |
|
The number of Jetty threads. The default value is |
|
The maximum size allowed for request and response headers. The default value is |
|
A comma separated list of allowed HTTP Host header values to consider when Clockspring is running securely and will be receiving requests to a different host[:port] than it is bound to. For example, when running in a Docker container or behind a proxy (e.g. localhost:18443, proxyhost:443). By default, this value is blank meaning Clockspring should only allow requests sent to the host[:port] that Clockspring is bound to. |
|
A comma separated list of allowed HTTP X-ProxyContextPath, X-Forwarded-Context, or X-Forwarded-Prefix header values to consider. By default, this value is blank meaning all requests containing a proxy context path are rejected. Configuring this property would allow requests where the proxy path is contained in this listing. |
|
The maximum size (HTTP |
|
The maximum number of requests from a connection per second. Requests in excess of this are first delayed, then throttled. |
|
The maximum number of requests for login Access Tokens from a connection per second. Requests in excess of this are rejected with HTTP 429. |
|
A comma separated list of IP addresses. Used to specify the IP addresses of clients which can exceed the maximum requests per second ( |
|
The request timeout for web requests. Requests running longer than this time will be forced to end with a HTTP 503 Service Unavailable response. Default value is |
Security Properties
These properties pertain to various security features in Clockspring. Many of these properties are covered in more detail in the Security Configuration section of this Administrator’s Guide.
Property |
Description |
|
This is the password used to encrypt any sensitive property values that are configured in processors. By default, it is blank, but the system administrator should provide a value for it. It can be a string of any length, although the recommended minimum length is 10 characters. Be aware that once this password is set and one or more sensitive processor properties have been configured, this password should not be changed. |
|
The algorithm used to encrypt sensitive properties. The default value is |
|
The sensitive property provider. The default value is |
|
The comma separated list of properties in clockspring.properties to encrypt in addition to the default sensitive properties (see Encrypted Passwords in Configuration Files). |
|
Specifies whether the SSL context factory should be automatically reloaded if updates to the keystore and truststore are detected. By default, it is set to |
|
Specifies the interval at which the keystore and truststore are checked for updates. Only applies if |
|
The full path and name of the keystore. It is blank by default. |
|
The keystore type. It is blank by default. |
|
The keystore password. It is blank by default. |
|
The key password. It is blank by default. |
|
The full path and name of the truststore. It is blank by default. |
|
The truststore type. It is blank by default. |
|
The truststore password. It is blank by default. |
|
Specifies which of the configured Authorizers in the authorizers.xml file to use. By default, it is set to |
|
Whether anonymous authentication is allowed when running over HTTPS. If set to true, client certificates are not required to connect via TLS. |
|
This indicates what type of login identity provider to use. The default value is blank, can be set to the identifier from a provider
in the file specified in |
|
This is the URL for the Online Certificate Status Protocol (OCSP) responder if one is being used. It is blank by default. |
|
This is the location of the OCSP responder certificate if one is being used. It is blank by default. |
Identity Mapping Properties
These properties can be utilized to normalize user identities. When implemented, identities authenticated by different identity providers (certificates, LDAP, Kerberos) are treated the same internally in Clockspring. As a result, duplicate users are avoided and user-specific configurations such as authorizations only need to be setup once per user.
The following examples demonstrate normalizing DNs from certificates and principals from Kerberos:
nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$ nifi.security.identity.mapping.value.dn=$1@$2 nifi.security.identity.mapping.transform.dn=NONE nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance@(.*?)$ nifi.security.identity.mapping.value.kerb=$1@$2 nifi.security.identity.mapping.transform.kerb=NONE
The last segment of each property is an identifier used to associate the pattern with the replacement value. When a user makes a request to Clockspring, their identity is checked to see if it matches each of those patterns in lexicographical order. For the first one that matches, the replacement specified in the nifi.security.identity.mapping.value.xxxx
property is used. So a login with CN=localhost, OU=Clockspring, O=Clockspring, L=Oxon Hill, ST=MD, C=US
matches the DN mapping pattern above and the DN mapping value $1@$2
is applied.
In addition to mapping, a transform may be applied. The supported versions are NONE
(no transform applied), LOWER
(identity lowercased), and UPPER
(identity uppercased). If not specified, the default value is NONE
.
These mappings are also applied to the "Initial Admin Identity", "Cluster Node Identity", and any legacy users in the authorizers.xml file as well as users imported from LDAP (See Authorizers.xml Setup). |
Group names can also be mapped. The following example will accept the existing group name but will lowercase it. This may be helpful when used in conjunction with an external authorizer.
nifi.security.group.mapping.pattern.anygroup=^(.*)$ nifi.security.group.mapping.value.anygroup=$1 nifi.security.group.mapping.transform.anygroup=LOWER
These mappings are applied to any legacy groups referenced in the authorizers.xml as well as groups imported from LDAP. |
Cluster Common Properties
When setting up a cluster, these properties should be configured the same way on all nodes.
Property |
Description |
|
The interval at which nodes should emit heartbeats to the Cluster Coordinator. The default value is |
|
Maximum number of heartbeats a Cluster Coordinator can miss for a node in the cluster before the Cluster Coordinator updates the node status to Disconnected. The default value is |
|
This indicates whether cluster communications are secure. The default value is |
Cluster Node Properties
Configure these properties for cluster nodes.
Property |
Description |
|
Set this to |
|
The fully qualified address of the node. It is blank by default. |
|
The node’s protocol port. It is blank by default. |
|
The maximum number of threads that should be used to communicate with other nodes in the cluster. This property defaults to |
|
When the state of a node in the cluster is changed, an event is generated
and can be viewed in the Cluster page. This value indicates how many events to keep in memory for each node. The default value is |
|
When connecting to another node in the cluster, specifies how long this node should wait before considering
the connection a failure. The default value is |
|
When communicating with another node in the cluster, specifies how long this node should wait to receive information
from the remote node before considering the communication with the node a failure. The default value is |
|
The maximum number of outstanding web requests that can be replicated to nodes in the cluster. If this number of requests is exceeded, the embedded Jetty server will return a "409: Conflict" response. This property defaults to |
|
The location of the node firewall file. This is a file that may be used to list all the nodes that are allowed to connect to the cluster. It provides an additional layer of security. This value is blank by default, meaning that no firewall file is to be used. See Cluster Firewall Configuration for file format details. |
|
Specifies the amount of time to wait before electing a Flow as the "correct" Flow. If the number of Nodes that have voted is equal to the number specified
by the |
|
Specifies the number of Nodes required in the cluster to cause early election of Flows. This allows the Nodes in the cluster to avoid having to wait a long time before starting processing if we reach at least this number of nodes in the cluster. |
|
Specifies the port to listen on for incoming connections for load balancing data across the cluster. The default value is |
|
Specifies the hostname to listen on for incoming connections for load balancing data across the cluster. If not specified, will default to the value used by the
|
|
The maximum number of connections to create between this node and each other node in the cluster. For example, if there are 5 nodes in the cluster and this value is set to 4, there will be up to 20 socket connections established for load-balancing purposes (5 x 4 = 20). The default value is |
|
The maximum number of threads to use for transferring data from this node to other nodes in the cluster. While a given thread can only write to a single socket at a time, a single thread is capable of servicing multiple connections simultaneously because a given connection may not be available for reading/writing at any given time. The default value is NOTE: Increasing this value will allow additional threads to be used for communicating with other nodes in the cluster and writing the data to the Content and FlowFile Repositories. However, if this property is set to a value greater than the number of nodes in the cluster multiplied by the number of connections per node ( |
|
When communicating with another node, if this amount of time elapses without making any progress when reading from or writing to a socket, then a TimeoutException will be thrown. This will then result in the data either being retried or sent to another node in the cluster, depending on the configured Load Balancing Strategy. The default value is |
ZooKeeper Properties
Clockspring depends on Apache ZooKeeper for determining which node in the cluster should play the role of Primary Node and which node should play the role of Cluster Coordinator. These properties must be configured in order for Clockspring to join a cluster.
Property |
Description |
|
The Connect String that is needed to connect to Apache ZooKeeper. This is a comma-separated list
of hostname:port pairs. For example, |
|
How long to wait when connecting to ZooKeeper before considering the connection a failure. The default value is |
|
How long to wait after losing a connection to ZooKeeper before the session is expired. The default value is |
|
The root ZNode that should be used in ZooKeeper. ZooKeeper provides a directory-like structure
for storing data. Each 'directory' in this structure is referred to as a ZNode. This denotes the root ZNode, or 'directory',
that should be used for storing data. The default value is |
|
Whether to acccess ZooKeeper using client TLS. |
|
Filename of the Keystore containing the private key to use when communicating with ZooKeeper. |
|
Optional. The type of the Keystore. Must be |
|
The password for the Keystore. |
|
Filename of the Truststore that will be used to verify the ZooKeeper server(s). |
|
Optional. The type of the Truststore. Must be |
|
The password for the Truststore. |
|
Maximum buffer size in bytes for packets sent to and received from ZooKeeper.
Defaults to The ZooKeeper Administrator’s Guide categorizes this property as an unsafe option.
Changing this property requires setting |
Kerberos Properties
Property |
Description |
|
The location of the krb5 file, if used. It is blank by default. At this time, only a single krb5 file is allowed to
be specified per Clockspring instance, so this property is configured here to support SPNEGO and service principals rather than in individual Processors.
If necessary the krb5 file can support multiple realms.
Example: |
|
The name of the Clockspring Kerberos service principal, if used. It is blank by default. Note that this property is for Clockspring to authenticate as a client other systems.
Example: |
|
The file path of the Clockspring Kerberos keytab, if used. It is blank by default. Note that this property is for Clockspring to authenticate as a client other systems.
Example: |
|
The name of the Clockspring Kerberos service principal, if used. It is blank by default. Note that this property is used to authenticate Clockspring users.
Example: |
|
The file path of the Clockspring Kerberos keytab, if used. It is blank by default. Note that this property is used to authenticate Clockspring users.
Example: |
|
The expiration duration of a successful Kerberos user authentication, if used. The default value is |
Analytics Properties
These properties determine the behavior of the internal Clockspring predictive analytics capability, such as backpressure prediction, and should be configured the same way on all nodes.
Property |
Description |
|
This indicates whether prediction should be enabled for the cluster. The default is |
|
The time interval for which analytical predictions (e.g. queue saturation) should be made. The default value is |
|
The time interval to query for past observations (e.g. the last 3 minutes of snapshots). The default value is |
|
The implementation class for the status analytics model used to make connection predictions. The default value is |
|
The name of the scoring type that should be used to evaluate the model. The default value is |
|
The threshold for the scoring value (where model score should be above given threshold). The default value is |
Runtime Monitoring Properties
Long-Running Task Monitor periodically checks the Clockspring processor executor threads and produces warning logs and bulletin messages for those that have been running for a longer period of time.
It can be used to detect possibly stuck / hanging processor tasks.
Please note the performance impact of the task monitor: it creates a thread dump for every run that may affect the normal flow execution.
The Long-Running Task Monitor can be disabled via defining no values for its properties, and it is disabled by default.
To enable it, both nifi.monitor.long.running.task.schedule
and nifi.monitor.long.running.task.threshold
properties need to be configured with valid time periods.
Property |
Description |
|
The time period between successive executions of the Long-Running Task Monitor (e.g. |
|
The time period beyond which a task is considered long-running, i.e. stuck / hanging (e.g. |
Performance Tracking Properties
NiFi exposes a very significant number of metrics by default through the User Interface. However, there are sometimes additional metrics that may add in diagnosing bottlenecks and improving the performance of the NiFi dataflow.
The nifi.performance.tracking.percentage
property can be used to enable the tracking of additional metrics. Gathering these metrics, however, require system calls, which can be
expensive on some systems. As a result, this property defaults to a value of 0
, indicating that the metrics should be captured 0% of the time. I.e., the feature is disabled by
default. To enable this feature, set the value of this property to an integer value in the range of 0 to 100, inclusive. This represents what percentage of the time NiFi should
gather these metrics.
For example, if the value is set to 20, then NiFi will gather these metrics for each processor approximately 20% of the times that the Processor is run. The remainder of the time, it will use the values that it has already captured in order to extrapolate the metrics to additional runs.
The metrics that are gathered include what percentage of the time the processor is utilizing the CPU (versus waiting for I/O to complete or blocking due to monitor/lock contention), what percentage of time the Processor spends reading from the Content Repository, writing to the Content Repository, blocked due to Garbage Collection, etc.
So, continuing our example, if we set the value of the nifi.performance.tracking.percentage
and a processor is triggered to run 1,000 times, then NiFi will measure how much CPU
time was consumed over the 200 iterations during which it was measured (i.e., 20% of 1,000). Let’s say that this amounts to 500 milliseconds of CPU time. Additionally, let’s consider
that the Processor took 5,000 milliseconds to complete those 200 invocations because most of the time was spent blocking on Socket I/O. From this, NiFi will calculate that the CPU
is used approximately 10% of the time (500 / 5,000 * 100%). Now, let’s consider that in order to complete all 1,000 invocations the Processor took 35 seconds. NiFi will calculate,
then, that the Processor has used approximately 3.5 seconds (or 3500 milliseconds) of CPU time.
As a result, if we set the value of this property higher, up to a value of 100
, we will get more accurate results. However, it may be more expensive to monitor.
In order to view these metrics, we can gather diagnostics by running the command nifi.sh diagnostics <filename>
and inspecting the generated file. See [nifi_diagnostics] for more information.
Custom Properties
To configure custom properties for use with Clocksprings Expression Language:
-
Create the custom property. Ensure that:
-
Each custom property contains a distinct property value, so that it is not overridden by existing environment properties, system properties, or FlowFile attributes.
-
Each node in a clustered environment is configured with the same custom properties.
-
-
Update
nifi.variable.registry.properties
with the location of the custom property file(s):
Property |
Description |
|
This is a comma-separated list of file location paths for one or more custom property files. |
-
Restart your Clockspring instance(s) for the updates to be picked up.
Custom properties can also be configured in the Clockspring UI. See the Variables Window section in the User Guide for more information.
Upgrading Clockspring
The instructions below are general steps to follow when upgrading from one release to another.
All nodes in a cluster must be upgraded to the same Clockspring version as nodes with different Clockspring versions are not supported in the same cluster. |
Preserve Custom Processors
If you have any custom NARs, preserve them during upgrade by storing them in a centralized location as follows:
-
Create a second library directory called
custom_lib
. -
Move your custom NARs to this new lib directory.
-
Add a new line to the clockspring.properties file to specify this new lib directory:
nifi.nar.library.directory=./lib nifi.nar.library.directory.custom=/opt/configuration_resources/custom_lib
Preserve Modified NARs
If you have modified any of the default NAR files, an upgrade will overwrite these changes. Preserve your customizations as follows:
-
Identify and save the changes you made to the default NAR files.
-
Perform your Clockspring upgrade.
-
Implement the same NAR file changes in your new Clockspring instance.
Clear Activity and Shutdown Existing Clockspring
On your existing Clockspring installation:
-
Stop all the source processors to prevent the ingestion of new data.
-
Allow Clockspring to run until there is no active data in any of the queues in the dataflow(s).
-
Shutdown your existing Clockspring instance(s).
Install the new Clockspring Version
Install the new Clockspring into a directory parallel to the existing Clockspring installation.
-
Download the latest version Clockspring.
-
Install the rpm
-
If you are upgrading a Clockspring cluster, repeat these steps on each node in the cluster.
Host Machine - Node 1 |--> opt/ |--> existing-clockspring |--> new-clockspring
Host Machine - Node 2 |--> opt/ |--> existing-clockspring |--> new-clockspring
Host Machine - Node 3 |--> opt/ |--> existing-clockspring |--> new-clockspring
Make sure that all file and directory ownerships for your new Clockspring directories match what you set on the existing directories. |
Update the Configuration Files for Your New Clockspring Installation
Use the configuration files from your existing Clockspring installation to manually update the corresponding properties in your new Clockspring deployment.
In general, do not copy configuration files from your existing Clockspring version to the new Clockspring version. The newer configuration files may introduce new properties that would be lost if you copy and paste configuration files. |
Use the following table to guide the update of configuration files located in <installation-directory>/conf
.
Configuration file | Necessary changes |
---|---|
authorizers.xml |
Copy the If you are using the Configuration best practices recommend creating a separate location outside of the Clockspring base directory for storing such configuration files, for example: |
bootstrap-notification-services.xml |
Use the existing Clockspring bootstrap-notification-services.xml file to update properties in the new Clockspring. |
bootstrap.conf |
Use the existing Clockspring bootstrap.conf file to update properties in the new Clockspring. |
flow.xml.gz |
If you retained the default location for storing flows ( If you are encrypting sensitive component properties in your dataflow via the sensitive properties key in clockspring.properties, make sure the same key is used when copying over your flow.xml.gz. If you need to change the key, see the [sensitive_flow_migration] section below. |
clockspring.properties |
Use the existing clockspring.properties to populate the same properties in the new Clockspring file. Note: This file contains the majority of Clockspring configuration settings, so ensure that you have copied the values correctly. |
If you followed Clockspring best practices, the following properties should be pointing to external directories outside of the base Clockspring installation path. If the below properties point to directories inside the Clockspring base installation path, you must copy the target directories to the new Clockspring. Stop your existing Clockspring installation before you do this. |
|
If you have retained the default value ( If you stored flows to an external location, update the property value to point there. |
|
Same applies as above if you want to retain archived copies of the flow.xml.gz. |
|
Best practices recommends that you use an external location for each repository. Point the new Clockspring at the same external database repository location. |
|
Best practices recommends that you use an external location for each repository. Point the new Clockspring at the same external flowfile repository location. Warning: You may experience data loss if flowfile repositories are not accessible to the new Clockspring. |
|
Best practices recommends that you use an external location for each repository. Point the new Clockspring at the same external content repository location. Your existing Clockspring may have multiple content repos defined. Make sure the exact same property names are used and point to the appropriate matching content repo locations. For example:
Warning: You may experience data loss if content repositories are not accessible to the new Clockspring. Warning: You may experience data loss if property names are wrong or the property points to the wrong content repository. |
|
Best practices recommends that you use an external location for each repository. Point the new Clockspring at the same external provenance repository location. Your existing instance may have multiple content repos defined. Make sure the exact same property names are used and point to the appropriate matching provenance repo locations. For example:
Note: You may not be able to query old events if provenance repos are not moved correctly or properties are not updated correctly. |
|
state-management.xml |
For the If you have retained the default location ( Configuration best practices recommend that you move the state to an external directory like |
For a cluster, the |
|
For a cluster, make sure the |
|
If you are also setting up a new external ZooKeeper, see the [zookeeper_migrator] section for instructions on how to move ZooKeeper information from one cluster to another and migrate ZooKeeper node ownership. |
Updating the Sensitive Properties Key
The following command can be used to read an existing flow.xml.gz configuration and set a new sensitive properties key in clockspring.properties:
$ ./bin/clockspring.sh set-sensitive-properties-key <sensitivePropertiesKey>
The minimum required length for a new sensitive properties key is 12 characters.
Start New Instnace
In your new installation:
-
Start each of your new instances.
-
Verify that:
-
All your dataflows have returned to a running state. Some processors may have new properties that need to be configured, in which case they will be stopped and marked Invalid ().
-
All your expected controller services and reporting tasks are running again. Address any controller services or reporting tasks that are marked Invalid ().
-
-
After confirming your new instances are stable and working as expected, the old installation can be removed.
If the original installation was setup to run as a service, update any symlinks or service scripts to point to the new version executables. |
Processor Locations
Available Configuration Options
Clockspring provides 3 configuration options for processor locations. Namely:
nifi.nar.library.directory nifi.nar.library.directory.<custom> nifi.nar.library.autoload.directory
Paths set using these options are relative to the Clockspring Home Directory. For example, if the Clockspring Home Directory is /opt/clockspring , and the Library Directory is ./lib , then the final path is /opt/clockspring/lib .
|
The nifi.nar.library.directory
is used for the default location for provided processors. It is not recommended to use this for custom processors as these could be lost during a upgrade. For example:
nifi.nar.library.directory=./lib
The nifi.nar.library.directory.<custom>
allows the admin to provide multiple arbritary paths for Clockspring to locate custom processors. A unique property identifier must append the property for each unique path. For example:
nifi.nar.library.directory.myCustomLibs=./my-custom-nars/lib nifi.nar.library.directory.otherCustomLibs=./other-custom-nars/lib
The nifi.nar.library.autoload.directory
is used by the autoload feature, where Clockspring can automatically load new processors added to the configured path without requiring a restart. For example:
nifi.nar.library.autoload.directory=./autoload/lib
Installing Custom Processors
This section describes the original process for installing custom processors that requires a restart to Clockspring. To use the Autoloading feature, see the below Autoloading Custom Processors section.
Firstly, we will configure a directory for the custom processors. See Available Configuration Options for more about these configuration options.
nifi.nar.library.directory.myCustomLibs=./my-custom-nars/lib
Ensure that this directory exists and has appropriate permissions for the user and group.
Now, we must place our custom processor nar in the configured directory. The configured directory is relative to the Clockspring Home directory; for example, let us say that our Clockspring Home Dir is /var/lib/clockspring
, we would place our custom processor nar in /var/lib/clockspring/my-custom-nars/lib
.
Ensure that the file has appropriate permissions for the clockspring user and group.
Restart Clockspring and the custom processor should now be available when adding a new Processor to your flow.
Autoloading Custom Processors
This section describes the process to use the Autoloading feature for custom processors.
To use the autoloading feature, the nifi.nar.library.autoload.directory
property must be configured to point at the desired directory. By default, this points at ./extensions
.
For example:
nifi.nar.library.autoload.directory=./extensions
Ensure that this directory exists and has appropriate permissions for the clockspring user and group.
Now, we must place our custom processor nar in the configured directory. The configured directory is relative to the Clockspring Home directory; for example, let us say that our Clockspring Home Dir is /var/lib/clockspring
, we would place our custom processor nar in /var/lib/clockspring/extensions
.
Ensure that the file has appropriate permissions for the clockspring user and group.
Refresh the browser page and the custom processor should now be available when adding a new Processor to your flow.
NAR Providers
Clockspring supports fetching NAR files for the autoloading feature from external sources. This can be achieved by using NAR Providers. A NAR Provider serves as a connector between an external data store and Clockspring.
An External Resource Provider serves as a connector between an external data source and Clockspring.
When configured, an External Resource Provider polls the external source for available NAR files and offers them to the framework. The framework then fetches new NAR files and copies them to
the nifi.nar.library.autoload.directory
for autoloading.
By default, the polling will happen every 5 minutes. It is possible to change this frequency by specifying the property nifi.nar.library.poll.interval
.
By default NAR files will be downloaded if no file with the same name exists in the folder defined by nifi.nar.library.autoload.directory
. By setting the nifi.nar.library.conflict.resolution
other conflict resolution strategies might be applied. Currently, the following strategies are supported:
Name |
Description |
IGNORE |
Will not replace files: if a file exists in the directory with the same name, it will not be downloaded again. |
REPLACE |
Will replace a file in the target directory if there is an available file in the source but with newer modification date. |
Until the first External Resource collection succeeds for every provider, the service prevents Clockspring from finishing startup. In order to override this behaviour, the nifi.nar.library.restrain.startup
needs to be declared.
With value true
the service prevents Clockspring from starting up until the execution succeeds, with false
it does not. The default value is true
in case of the property is not set.
An External Resource Provider can be configured by adding the nifi.nar.library.provider.<providerName>.implementation
property with value containing the proper implementation class. Some implementations might need
further properties. These are defined by the implementation and must be prefixed with nifi.nar.library.provider.<providerName>.
.
The <providerName>
is arbitrary and serves to correlate multiple properties together for a single provider. Multiple providers might be set, with different <providerName>
. Currently Clockspring supports HDFS based NAR provider.
HDFS External Resource Provider
This implementation is capable of downloading files from an HDFS file system.
The value of the nifi.nar.library.provider.<providerName>.implementation
must be org.apache.nifi.flow.resource.hadoop.HDFSExternalResourceProvider
.
The following additional properties are defined by the provider:
Name | Description |
---|---|
resources |
List of HDFS resources, separated by comma. |
source.directory |
The source directory of NAR files within HDFS. Note: the provider does not check for files recursively. |
storage.location |
Optional. If set the storage location defined in the core-site.xml will be overwritten by this value. |
kerberos.principal |
Optional. Kerberos principal to authenticate as. |
kerberos.keytab |
Optional. Kerberos keytab associated with the principal. |
kerberos.password |
Optional. Kerberos password associated with the principal. |
Example configuration:
nifi.nar.library.provider.hdfs1.implementation=org.apache.nifi.flow.resource.hadoop.HDFSExternalResourceProvider nifi.nar.library.provider.hdfs1.resources=/etc/hadoop/core-site.xml nifi.nar.library.provider.hdfs1.source.directory=/customNars
nifi.nar.library.provider.hdfs2.implementation=org.apache.nifi.flow.resource.hadoop.HDFSExternalResourceProvider nifi.nar.library.provider.hdfs2.resources=/etc/hadoop/core-site.xml nifi.nar.library.provider.hdfs2.source.directory=/other/dir/for/customNars
Diagnostics
It is possible to run diagnostics on Clockspring with
$ ./bin/clockspring.sh --diagnostics --verbose <dumpfilePath>
During the diagnostic, Clockspring sends a request to an already running Clockspring instance, which collects information about clusters, components, part of the configuration, memory usage, etc., and writes it to the specified file or, failing that, to the logs.
The verbose switch is optional and can be used to control the level of diagnostic detail. In case of a missing dump file path, Clockspring writes the diagnostics information to the bootstrap.log file.
Automatic diagnostics on restart and shutdown
Clockspring supports automatic diagnostics in the event of a shutdown. The feature is disabled by default. The settings can be found in the clockspring.properties file and the feature can be enabled there also. In the case of a lengthy diagnostic, Clockspring may terminate before the diagnostics are completed. In this case, the graceful.shutdown.seconds property should be set to a higher value in the bootstrap.conf.