|EMN, INRIA, DIKU|
ARRAY_SIZE(array.html, array.cocci). This semantic patch first checks that the header file
ARRAY_SIZEis included, then converts various code patterns to calls to
ARRAY_SIZEand finally removes local macros.
The semantic patch could be extended to cases where
is not already included, but it is not easy to add a new include file in
the "right" place.
The rules for introducing calls to
parentheses around the matched expression. The isomorphism
paren causes code where these parentheses are absent to also
be considered. The isomorphism is applied after the parse tree is
constructed for the rule, so dropping the parentheses does not introduce
the possibility of associativity problems.
One of the rules in this semantic patch checks for a case where the size of
the array is divided by the size of the type of its elements. The required
type information is often stored in header files. To improve performance,
Coccinelle normally only considers the header files in the current
directory and those in the include path whose name corresponds to the name
of the given C file. The option
-all_includes causes all of
the header files mentioned in the C file to be included. When treating a
directory, a header file is processed each time it is included. If there
are transformations that affect the header file, these will appear more
than once in the output.
DIV_ROUND_UP(round.html, round.cocci). This semantic patch is similar to the one for introducing
BUG_ON(bugon.html, bugon.cocci). When debugging is enabled,
BUG_ONexpands to a conditional that invokes
BUGif its argument is true. The advantage of using
BUG_ONrather than such a conditional is that when debugging is not desired,
BUG_ONcan be defined to expand to nothing at all. This semantic patch converts a conditional that has only
BUGin its then branch to a call to
BUG_ON. This transformation, however, is only safe when the tested expression has no side effects. Since
BUG_ONcan discard its argument, the rule checks that this expression does not contain any function calls or assignments. A more complicated rule could be written that would lift such a term out of the conditional test and rewrite the conditional to test the result. In the case of a test expression that does not involve an assignment, this would require introducing a new variable to store the result. Such a variable could be declared as a SmPL metavariable using
fresh identifier, in which case the user would be prompted for the identifier name.
One "function call" that does not have a side effect and is often used in
unlikely. The first rule
of the semantic patch treats cases involving such calls. It disables the
unlikely, which can cause some redundant matches,
unlikely expression is an expression itself.
!x&y(notand.html, notand.cocci). An expression of this form is almost always meaningless, because it combines a boolean operator with a bit operator. In particular, if the rightmost bit of
yis 0, the result will always be 0. This semantic patch focuses on the case where
yis a constant. Another possibility is to consider the case where
yis an arbitrary expression (notand_exp.html, notand_exp.cocci). This rule contains a disjunction that causes it not to perform the transformation when
yis itself negated, as an expression of the form
!x&!ycan make sense.
!E && E1or
E || E1, where
E->fld(andand.html, andand.cocci). In either case,
E->fldrepresents a NULL pointer dereference. In each rule of the semantic patch, the header makes it explicit that the body of the rule represents an expression, which solves a SmPL parsing problem.
sizeofto the result of
sizeof(sizeof.html, sizeof.cocci). The rule considers separately the cases where the argument is a type or an expression. It would not be sufficient to replace the argument of the inner
...", because that would only match an expression.
u8, however, are sometimes not included directly in the C file, but in some file that the C file includes. Since spatch has no option to include files recursively, it would be necessary to write extra rules to find bugs in the use of these types.
DEFINE_MUTEXis declared among the metavariables as a
declarer namemeaning that the
DEFINE_MUTEXpattern should be parsed as a variable declaration rather than as a function call.
strict_strtolconvert a string to a signed integer. Sometimes, however, the result is stored in a location having an unsigned type. This semantic patch translates such calls to
strict_strtoul, respectively. It considers a type to be unsigned if it is anything other than
s32, thus getting around the problem of typedefs in header files encountered in the case of (find_unsigned.html, find_unsigned.cocci). False positives are, however, possible, if some other signed type is used.
continueat the bottom of a
forloop (continue.html, continue.cocci). This semantic patch removes a
continuein a conditional at the bottom of a
forloop. Unfortunately, due to the control-flow based nature of Coccinelle, this semantic patch sometimes considers a continue to be at the bottom of a
forloop when actually it is not.
xis preceded by another NULL test
xthat causes a return if
xis NULL. The next three rules consider whether the NULL test is useful because of some other control-flow path. For NULL tests that area useless, the rule
fixthen rewrites some cases where the NULL test is straightforward to eliminate. The Python rule at the end of the semantic patch prints out some information about cases that were not possible to fix.
!, with the intuition that the NULL value reflects the failure of the function call, and in the latter case, it uses a comparison to NULL. Other strategies could be implemented. Both rules in the semantic patch disable the isomorphisms is_zero and isnt_zero, which convert comparisons to 0 to simpler expressions involving just the compared value or its negation. For this transformation, we only want cases where there is an explicit comparison to 0.
extern, in which case the initialization may be more useful that it appears locally. This rule records the position of any such initialization in the position metavariable
p. The second rule finds a declaration of a variable that is at a position that is different from any of the recorded ones, and then checks that the only subsequent references to the variable simply initialize it to a constant. In this case, the declaration and all of the initializations are deleted.
Other variations could be considered. This semantic patch focuses on constants, because they have no side effects. Variables, however, would have the same property. Another variation would be when a variable is assigned to the result of calling a function, but the value is never used. In this case, the function call cannot be deleted, but it is at least possible that the variable is holding some sort of error code that should be checked or freshly allocated memory that should be stored or freed.
memset0 after a call to
alloc_bootmem(bootmem.html, bootmem.cocci). The Linux function
alloc_bootmemand variants all initialize the allocated data to 0 and panic if the allocation is not possible. This semantic patch removes any subsequent NULL test or
memsetwhen there is no previous reference to allocated data. The constraint that there be no previous reference to the allocated data is particularly important in the
memsetcase, as the
memsetcould be useful if it serves to reset the allocated memory to 0. In the NULL test case, the constraint could be relaxed to check only that there is no intervening reassignment, but it does not seem likely that this would find more bugs, since normally a NULL test would come before any accesses.
p2, respectively. The first two disjuncts seem redundant, since one matches an assignment as a statement and the other matches an assignment of the same form, but as an expression. The first case actually only serves to indicate to the SmPL parser that this rule is a disjunction of statements, as is needed to parse the final disjunct containing "
...", and not a disjunction containing only expressions.
The next two rules consider the possibility that the NULL test is there because of some other path that does not go through the dereference. In the first case, the path starts with an assignment of the location of interest, while in the second case there is no assignment and the path starts at the beginning of the function definition.
The last rule prints out the result in emacs org mode format, providing links to the dereference and the NULL test.
return, with or without an argument, 2) The then branch is a sequence of statements ending in a
return, with or without an argument, 3) The then branch is a sequence of statements ending in a
goto, which ultimately leads to a return.
mutex_lockis followed on every control-flow path either by a call to
mutex_unlockor a conditional ending in a return, which is assumed to represent error-handling code. If such a conditional does not contain a call to
mutex_unlock, then this is assumed to be a bug, and one is added by the semantic patch.
kmallocor a related function is not followed by a call to
kfree(kmalloc.html, kmalloc.cocci). This semantic patch makes the constraints that the variable storing the result of calling
kmallocis a local variable and that the only reference to this variable between the allocation point and the return is an initialization of its fields. The former constraint ensures that at the point of the allocation, there is no pointer accessible from outside the function to the allocated data. The latter constraint ensures that no such pointer is created, eg by passing the allocated data to another function which stores it in a global data structure.
Often, Linux error handling code performs a goto to a label at the end of
the function that performs a sequence of deallocations. The matching of a
goto in the first rule of this semantic patch only serves to record its
position to be used in creating the subsequent output, to make it easier to
see what control-flow path was used in detecting the bug. The semantic
patch contains two rules for printing out the results. The first rule is
used only if the position,
p3, of a goto is available. This
rule ends in
cocci.include_match(False), indicating the
bindings used in this rule should be discarded. The second rule is then
used for the other possible bindings, for the cases where there is no goto.
pci_get_deviceand some related functions have the property that they take an argument that is the starting point of a search, and after completing the search they decrement the reference count of this argument. This behavior is particularly convenient for iterating over a sequence of objects, as when the iteration completes all objects have been freed without any explicit reference count manipulation. When there is an early return from within such a loop, however, it may be necessary to decrement the reference count explicitly. This semantic patch detects and fixes such returns for
for_each_pci_devloops. In each case, the iteration variable is declared as
local idexpression, to ensure that it is a local variable, and thus not visible from outside the function.
for_each_pci_devis declared as an
iterator namein the second rule, so that it will be parsed correctly.
scsi_device_put(missing_put.html, missing_put.cocci). This rule is similar to the previous one, but it focuses on loop types that imply different reference count functions. Each of the various loop types is declared using
iterator namein only the first rule in which it appears. It is then considered to be an
iterator namethroughout the rest of the semantic patch.
local_irq_restore(local.html, local.cocci). A call to
local_irq_saveshould normally be followed by a call to
local_irq_restorealong all paths. This rule uses
when anyto allow any number of conditionals to appear between then call to
local_irq_restoreand the matched conditional (otherwise matching would stop at the first conditional, due to the shortest path constraint on "
when strictto ensure that all paths are checked, even those that are considered to be error paths (normally constraints on error paths are relaxed, as the code cannot be expected to eg free something if its allocation has failed).
FREEthat have to be manually replaced by the name of an allocation function and its corresponding deallocation function before using the sematic match. This semantic match targets more specialized resource allocation protocols than
kfree. It puts fewer constraints on what it reports as bugs, and thus may return more false positives, in particular because it allows the allocated value to be passed to another function, which may save it in some way, implying there is no need for a deallocation. However, the more specialized allocation functions are used less often, and thus we have not found the number of false positives to be burdensome in practice. If desired, one could replace
kfreein the kmalloc rule by the names of other allocation and deallocation functions and obtain potentially fewer false positives, but potentially more false negatives as well.
Concretely, the semantic match focuses on the allocation site (storing the result in a local variable) and each subsequent return that does not return either 0, some expression involving the allocated value, or some other non-NULL pointer. Between these points, the semantic match checks that there is no call to the deallocation function, either directly or under a conditional, that the return is not in a control-flow path that checks that the result of the allocation function is NULL, and that the allocated value has not been stored somewhere or overwritten. If these cases are satisfied, the Python script prints the position of the call to the allocation function and of the return.
The constraints on a return that is considered of interest and the path
between the call to the allocation function are purely heuristic and may
give good or bad results depending on how the allocation function is
expected to be used. For example, the constraint that a return of 0 is not
of interest derives from the assumption that 0 represents success and in a
success case, all allocated memory has been saved or freed in some way,
even if it is not apparent to the semantic patch. This assumption may be
reasonable in the case of an allocation function that is based on
kmalloc, but we have found that it is less likely to be valid
in the case of an "allocation" function whose effect is to increment a
reference count. Often a reference count should be both incremented and
decremented within a single function (similar to a lock), and thus should
in particular be decremented even if the containing function succeeds.
Some pairs of allocation and deallocation functions that we have considered
pci_dev_put. Semantic patches for
these, that are slightly different from alloc_free.cocci in that they try
to correct the bug when possible, are available in (iounmap_check.html, iounmap_check.cocci), (auth.html, auth.cocci), (ofname1.html, ofname1.cocci), (get_slot.html, get_slot.cocci), respectively. By trying to
correct the bug, however, these semantic patches may be less successful
than the corresponding instantiations of alloc_free.cocci, because they can
fail when multiple control-flow paths reach a return, as this is considered
to provide inconsistent information as to how the return should be updated.
SPIN_LOCK_UNLOCKEDis deprecated, as described in the Linux file
Documentation/spinlocks.txt. The semantic patch handles the case of a variable declaration or a dynamic initialization.
DEFINE_SPINLOCKis declared as a
declarer nameso that the SmPL parser knows that it is a valid replacement for a variable declaration; if
DEFINE_SPINLOCKwere considered to be the name of a function, it would not satisfy this constraint, resulting in a parse error.
The case where
SPIN_LOCK_UNLOCKED is used in a structure
initialization cannot be handled in general, because it would be necessary
to accumulate an arbitrary number of structure fields as the argument of
__SPIN_LOCK_UNLOCKED. Coccinelle could be used to find such
SPIN_LOCK_UNLOCKED, which then could be updated by
kmalloc/memsetconversion (kzalloc.html, kzalloc.cocci). This semantic patch translates a call to
kmallocfollowed by a call to
memsetinto a single call to
kzallocthat performs both operations. Many cases are considered to ensure that the call to
memsetis intended to initialize to zero the complete allocated data, rather than e.g., to reinitialized it after performing some other operations. The semantic patch also converts calls to
kzallocwhere the first argument is a product of the size of a structure and some other value to a call to
kcalloc. The latter function performs an overflow check before performing the multiplication.