CONTRIBUTING.md - OpenGrok cross reference for /freebsd/sys/contrib/zstd/CONTRIBUTING.md

Lines Matching +full:signal +full:- +full:to +full:- +full:noise
1 # Contributing to Zstandard
2 We want to make contributing to this project as easy and transparent as
24 In order to accept your pull request, we need you to submit a CLA. You only need
25 to do this once to work on any of Facebook's open source projects.
30 Zstd uses a branch-based workflow for making changes to the codebase. Typically, zstd
31 will use a new branch per sizable topic. For smaller changes, it is okay to lump multiple
51         git checkout -b <branch-name>
52         git push origin <branch-name>
57         git add -u && git commit -m <message>
58         git push origin <branch-name>
60         * Note: run local tests to ensure that your changes didn't break existing functionality
71 …* Before sharing anything to the community, create a pull request in your own fork against the dev…
74         below to see how to do this.
76 … * When you are ready to share you changes to the community, create a pull request from your branch
77 … to facebook:dev. You can do this very easily by clicking 'Create Pull Request' on your fork's home
81 …    * Examine the diff presented between the two branches to make sure there is nothing unexpected.
83         * While there is no strict template that our contributors follow, we would like them to
89 …* Are there background references and documents that reviewers should be aware of to properly asse…
90 …* Note: make sure to point out any design and architectural decisions that you made and the ration…
91 …* Note: if you have been working with a specific user and would like them to review your work, mak…
95 … * You will have to iterate on your changes with feedback from other collaborators to reach a point
97         * To avoid too many comments on style and convention, make sure that you have a
106 …     that the change must make it to an official zstd release for it to be meaningful. We recommend
108 …  their change makes it to the next release of zstd. Users will often discover bugs in your code or
109 …     suggest ways to refine and improve your initial changes even after the pull request is merged.
113 executing it. It usually helps us find many simple bugs. Zstd uses clang's `scan-build` tool for
114 …install it by following the instructions for your OS on https://clang-analyzer.llvm.org/scan-build.
122 In general, you can use `scan-build` to static analyze any build script. For example, to static ana…
126 scan-build make -C contrib/largeNbDicts largeNbDicts
130 `scan-build` is part of our regular CI suite. Other static analyzers are not.
132 It can be useful to look at additional static analyzers once in a while (and we do), but it's not a…
134 - Static analyzers are full of false positive. The signal to noise ratio is actually pretty low.
135 - A good CI policy is "zero-warning tolerance". That means that all issues must be solved, includin…
136 - Multiple static analyzers will feature multiple kind of false positives, sometimes applying to th…
137 …+ torteous code, trying to please multiple constraints, hurting readability and therefore maintena…
138 …+ sometimes, these constraints are mutually exclusive : if one try to solve one, the other static …
139 - As if that was not enough, the list of false positives change with each version. It's hard enough…
141 …elps expressing the code in a way which is more readable or more difficult to misuse. These kind o…
146 longer to run than others. Currently, our CI is set up to run a short
147 series of tests when creating a PR to the dev branch and a longer series of tests
148 when creating a PR to the release branch. You can look in the configuration files
151 Most people will just want to create a PR with the destination set to their local dev
153 re-run tests and cancel running tests from the PR page or from the respective CI's dashboard.
155 … on GitHub Actions (configured at `.github/workflows`), which will automatically run on PRs to your
157 These require work to set up on your local fork, and (at least for Travis CI) cost money.
158 Therefore, if the PR on your local fork passes GitHub Actions, feel free to submit a PR
161 ### Third-party CI
162 A small number of tests cannot run on GitHub Actions, or have yet to be migrated.
163 For these, we use a variety of third-party services (listed below). It is not necessary to set
164 these up on your fork in order to contribute to zstd; however, we do link to instructions for those
165 who want earlier signal.
168 …-----------|--------------------------------------------------------------------------------------…
169 …-x86 architectures such as PowerPC                                                  | https://docs…
170 …                        | https://www.appveyor.com/blog/2018/10/02/github-apps-integration/ <br> h…
171 …                                     | https://github.com/marketplace/cirrus-ci/                  …
172 …to provide faster signal,<br/> but we may be able to migrate these to Github Actions | https://cir…
174 Note: the instructions linked above mostly cover how to set up a repository with CI from scratch. 
175 The general idea should be the same for setting up CI on your fork of zstd, but you may have to 
176 follow slightly different steps. In particular, please ignore any instructions related to setting up
181 landscape and corresponding trade-offs have been adequately analyzed, reproduced, and presented.
182 This high bar for performance means that every PR which has the potential to
183 impact performance takes a very long time for us to properly review. That being said, we
184 always welcome contributions to improve performance (or worsen performance for the trade-off of
190 time to search through old issues and pull requests using keywords specific to your
191 would-be PR. Of course, just because a topic has already been discussed (and perhaps rejected
193 it will be helpful for you to have context from that topic's history before contributing.
194 2. The distinction between noise and actual performance gains can unfortunately be very subtle
195 especially when microbenchmarking extremely small wins or losses. The only remedy to getting
197 take the time to run extensive, long-duration, and potentially cross-(os, platform, process, etc)
198 benchmarks on your end before submitting a PR. Of course, you will not be able to benchmark
200 you can:) We've adding some things to think about when benchmarking below in the Benchmarking
203 legitimate thing to do as long as it does not harm the overall performance health of Zstd.
204 This is a hard balance to strike but please keep in mind other aspects of Zstd when
205 submitting changes that are clang-specific, windows-specific, etc.
210 is a good place to start.
213 Unfortunately, the most important aspect in being able to benchmark reliably is to have a stable
215 will typically not be stable enough to obtain reliable benchmark results. If you can get your
218 Of course, benchmarking can be done on non-hyper-stable machines as well. You will just have to
219 do a little more work to ensure that you are in fact measuring the changes you've made not and
220 noise. Here are some things you can do to make your benchmarks more stable:
222 1. The most simple thing you can do to drastically improve the stability of your benchmark is
223 to run it multiple times and then aggregate the results of those runs. As a general rule of
224 thumb, the smaller the change you are trying to measure, the more samples of benchmark runs
225 you will have to aggregate over to get reliable results. Here are some additional things to keep in
227     * How you aggregate your samples are important. You might be tempted to use the mean of your
228     results. While this is certainly going to be a more stable number than a raw single sample
229     benchmark number, you might have more luck by taking the median. The mean is not robust to
238     * Most processors will take some time to get `hot` when running anything. The observations
243     and then hand picking a good threshold after which the variance in results seems to stabilize.
245 another cpu/memory-intensive application in the background. If you are running benchmarks on your
250     * If you have multiple cores, you can even run your benchmark on a reserved core to prevent
251     pollution from other OS and user processes. There are a number of ways to do this depending
254 …s, you can "Set Processor Affinity" using https://www.thewindowsclub.com/processor-affinity-windows
255 …to use their dedicated affinity API https://developer.apple.com/library/archive/releasenotes/Perfo…
256 3. To benchmark, you will likely end up writing a separate c/c++ program that will link libzstd.
263 6. Try to avoid storage. On some systems you can use tmpfs. Putting the program, inputs and outputs…
269 The fastest signal you can get regarding your performance changes is via the in-build zstd cli
271 and then additionally also specify the `-b#` option. Doing this will run our benchmarking pipeline
272 for that options you have just provided. If you want to look at the internals of how this
276 very first thing you should use to asses whether you actually achieved any sort of improvement
277 is `zstd -b`. You might try to do something like this. Note: you can use the `-i` option to
282 $ git checkout <commit-before-your-change>
283 $ make && cp zstd zstd-old
284 $ git checkout <commit-after-your-change>
285 $ make && cp zstd zstd-new
286 $ zstd-old -i5 -b1 <your-test-data>
287  1<your-test-data>         :      8990 ->      3992 (2.252), 302.6 MB/s , 626.4 MB/s
288 $ zstd-new -i5 -b1 <your-test-data>
289  1<your-test-data>         :      8990 ->      3992 (2.252), 302.8 MB/s , 628.4 MB/s
292 Unless your performance win is large enough to be visible despite the intrinsic noise
293 on your computer, benchzstd alone will likely not be enough to validate the impact of your
295 changed but there could be a small <3% improvement that the noise on the host machine
296 obscured. So unless you see a large performance win (10-15% consistently) using just
300 There are a number of great profilers out there. We're going to briefly mention how you can
306 code that you think can be made to run faster.
308 The first thing you will want to do is make sure that the piece of code is actually taking up
309 a notable amount of time to run. It is usually not worth optimizing something which accounts for le…
310 0.0001% of the total running time. Luckily, there are tools to help with this.
312 If your target code snippet is only part of a function, it might be worth trying to
313 isolate that snippet by moving it to its own function (this is usually not necessary but
317 functions for you. Your goal will be to find your function of interest in this call graph
318 and then inspect the time spent inside of it. You might also want to to look at the
323 whose performance can be improved upon. Follow these steps to profile your code using
331     usually be long enough). This way the profiler will have something to work with
332     and you will have ample time to attach your profiler to this process:)
335 $ zstd -b1 -i5 <my-data> # this will run for 5 seconds
337 5. Once you run your benchmarking script, switch back over to instruments and attach your
338 process to the time profiler. You can do this by:
340     * Selecting your process from the dropdown. In my case, it is just going to be labeled
347 8. You should be able to see your call graph.
350     you will have to supply the `-g` flag alone with your build script. You might also
351     have to provide the `-fno-omit-frame-pointer` flag
352 9. Dig down the graph to find your function call and then inspect it by double clicking
353 the list item. You will be able to see the annotated source code and the assembly side by
359 leave you to check that out of you're getting started:
364 * Use `perf stat -r # <bench-program>` to quickly get some relevant timing and
367 * Perf has a long list of hardware counters that can be viewed with `perf --list`.
368 When measuring optimizations, something worth trying is to make sure the hardware
369 counters you expect to be impacted by your change are in fact being so. For example,
370 if you expect the L1 cache misses to decrease with your change, you can look at the
371 counter `L1-dcache-load-misses`
379 We use GitHub issues to track public bugs. Please ensure your description is
380 clear and has sufficient instructions to be able to reproduce the issue.
387 It's a pretty long topic, which is difficult to summarize in a single paragraph.
388 As a rule of thumbs, try to imitate the coding style of
390 The following is a non-exhaustive list of rules employed in zstd code base:
394 with 2 extensions : 64-bit `long long` types, and variadic macros.
395 This rule is applied strictly to code within `lib/` and `programs/`.
396 Sub-project in `contrib/` are allowed to use other conventions.
404 This design requirement is fundamental to preserve the portability of the code base.
406 - Reduce dependencies to the minimum possible level.
411 - Within `lib/`, this policy is even more drastic.
416   Other accepted non-symbol headers are `<stddef.h>` and `<limits.h>`.
417 - Within the project, there is a strict hierarchy of dependencies that must be respected.
418   `programs/` is allowed to depend on `lib/`, but only its public API.
423 - Functions in `lib/` must use very little stack space,
426   or require a scratch buffer to be emplaced manually.
430   + private symbols, with a scope limited to their own unit, are free of this restriction.
432     each symbol name must attempt to be (and remain) unique.
437     - `PREFIX_prefix2_camelCase`
438     - `PREFIX_camelCase_extendedQualifier`
439 * Multi-words names generally consist of an action followed by object:
440   - for example : `ZSTD_createCCtx()`
442   - `goBackward` rather than `notGoForward`
444   except that they are allowed and even invited to start by an Uppercase letter.
455   Any variable that can be `const` (aka. read-only) **must** be `const`.
458   `const` variables are an important signal to readers that this variable isn’t modified.
459 …Conversely, non-const variables are a signal to readers to watch out for modifications later on in…
466   to control any condition the code expects for its correct execution.
473   Whenever applicable, prefer employing the code as the primary way to convey explanations.
476 * At declaration level, the documentation explains how to use the function or variable
488 By contributing to Zstandard, you agree that your contributions will be licensed