Expose Lucene features to CouchDB via Erlang RPC.
Originally dubbed ziose, version 3.x replaces the foundation of clouseau with
a ZIO as an asynchronous scheduler. Additionally, this version depends on the Java interface provided by Erlang/OTP.
This project uses a combination of asdf tool
management and direnv. The direnv tool is brought by asdf-direnv plugin.
All tools managed by asdf are configured in .tool-versions which looks somewhat like the following:
# pre-requisite
direnv 2.33.0
# build tools
java openjdk-21.0.2
scala 2.13.16
# erlang needs java so it should be after it in the list
erlang 26.2.5.13
Additional dependencies you may need to install manually on MacOS:
- Homebrew
- asdf via
brew install asdf - coreutils via
brew install coreutils - git via
brew install git - xcode
If you don't have asdf + asdf-direnv combination on your system already, there are extra steps that need to be done.
The steps are documented in full details here. Essentially the steps are:
- Install
asdfwith e.g.brew install asdf - Verify your OS has all tools we need using
scripts/cli verify - Use step-by-step guide script to finish installation (you might need to call it multiple times)
scripts/cli bootstrap - Restart your shell and
cdinto project directory - Enable configuration by calling
direnv allow
Please refer to the styleguide for details on development style.
In order to simplify project maintenance we provide a cli command. This command becomes available in your terminal
when you cd into project directory.
Currently, cli provides following commands:
await: Await clouseau node to finish start upbootstrap: A step-by-step guide to help set up environmentcommands: List all commandsdeps: A set of dependency management commandsfmt: Reformat scala codegh: Low level access to GitHub related commandshelp: Show help for all commandsissue: Issue managementlogs: Get recent logs filename for terminated clouseau nodeprocessId: Get clouseau PIDstart: Start clouseau nodestop: Stop clouseau nodetdump: Do a java tread dumpverify: Verify development dependencieszeunit: Run zeunit tests
You can find detailed documentation here scripts/cli.md.
Unlike previous versions, Clouseau 3.x is configured via a HOCON formatted app.conf file, in addition to the numerous JVM command line options available. The top level app.conf file in this project briefly documents the various options, but those relevant to performance and scalability are discussed in more detail below.
This option specifies the maximum number of indexes that can be open at a given time. Since each Lucene index opened by Clouseau has an overhead, if they are allowed to open without bounds, then the JVM will run out of memory. By default this is set to 100, but for large deployments with many active indexes, this number will increase significantly. Once this limit is reached, Clouseau will close the index that has been open the longest time.
These options allow closing search indexes if there is no activity within a specified interval. As mentioned above, when number of open indexes reaches the max_indexes_open limit, Clouseau will close the index that was opened first, even if there is an activity on that index (which can be problematic). Hence this option was created to close the idle indexes first, to hopefully avoid reaching the limit specified in max_indexes_open.
If close_if_idle is true, then Clouseau will monitor the activity its indexes, and close any with no activity in two consecutive idle check intervals. By default idle_check_interval_secs is 300 seconds. which will close an index if it has no activity between 301 to 600 seconds.
The command line option -Xmx sets the maximum heap size for the JVM. The amount of heap usage depends on the number of open search indexes and also the search load (sorting etc). The amount of heap required usually correlates with the max_indexes_open settings, so more indexes open requires more memory.
The recommendation is to set this value to a maximum of one third of the available memory and to never allocate more than 50% of the total available memory. So if the nodes on cluster have 30GB of memory available, then limit -Xmx to 10GB and if that's not enough and the user workload still requires more memory then try increasing it to 15GB (50% of available). But be cautious when exceeding 1/3 of the available memory as it could result in less memory available for Erlang runtime and the OS.
This option configures the minimum JVM memory, and is recommended to set in cases of higher maximum heap size (> 8GB). If set, do so at 80% of -Xmx. This allows the JVM to set initial memory when Clouseau is started, to avoid dynamic heap resizing and lags.
JConsole can be connected to a running Clouseau 3.x instance through the standard JMX interface as follows:
- Run Clouseau first,
make clouseau1 - Open another terminal and type
make jconsole - Select MBeans ->
com.cloudant.clouseau
The Scala Built Tool sbt can be used directly to compile and start up the service, or to run individual unit tests, e.g.
sbt console
sbt "testOnly com.cloudant.ziose.clouseau.ClouseauTypeFactorySpec"It can also be used for interactive experimentation as the following console session demonstrates:
❯ sbt
...
[info] started sbt server
sbt:ziose> console
[info] Starting scala interpreter...
Welcome to Scala 2.13.16 (OpenJDK 64-Bit Server VM, Java 21.0.2).
Type in expressions for evaluation. Or try :help.
scala> import zio._
import zio._
scala> import zio.Console._
import zio.Console._
scala> import zio.stream.ZStream
import zio.stream.ZStream
scala> val stream = ZStream(1,2,3,4).merge(ZStream(9,8,7,6))
val stream: zio.stream.ZStream[Any,Nothing,Int] = zio.stream.ZStream@aad94db
scala> val tapped = stream.tap(x => printLine(s"${x}"))
val tapped: zio.stream.ZStream[Any,java.io.IOException,Int] = zio.stream.ZStream@7f983cdd
scala> Unsafe.unsafe { implicit unsafe => Runtime.default.unsafe.run(tapped.runDrain) }
9
8
7
6
1
2
3
4
val res0: zio.Exit[java.io.IOException,Unit] = Success(())
scala> :q