When using a dot in your sample regexp, it appears that you are attempting to retrieve both floating point numbers and integers indiscriminately. To account for the sign, an optional sign should be considered first:
[+-]?
Following this, there must be a sequence of digits (at least one):
[0-9][0-9]*
(this can also be written as \d+
)
next, optionally, a dot followed by another sequence of digits (which may be empty)
(\.\d*)?
Furthermore, if you want to ensure these numbers are not attached to alphabetic input, word boundaries need to be placed on both ends. Therefore, the final regex would look like:
\b[+-]?\d+(\.\d*)?\b
As demonstrated in demo.
The demo showcases three unusual cases which warrant attention:
- The right boundary avoids matching
+15350.16f
, capturing only +15350
. The dot is recognized as a boundary, however, since it's a valid number, we exclude the right boundary.
- In this instance, the
+
sign functions as a nonword character, creating a left-side word boundary to correctly scan.
- In this case, due to the left boundary, we need to skip the initial part of the number (
e25
). The dot acts as a word boundary for the fractional part, allowing 42
to be scanned as a number after the dot. This scenario seems complex; additional context might be required to address this situation.
To mitigate the last case, context needs to be added prior to our number, determining whether to accept or reject the number based on that context. If something matches within the first group, everything is discarded; hence:
([a-zA-Z]?)
When appended to our regexp:
([a-zA-Z]?)([+-][0-9]+(\.[0-9]+)?)
In such cases, rejection occurs if group 1 has any matches. Conversely, if group 1 is empty, the number from group 2 is obtained. Refer to demo2.
The demo illustrates that a letter connected to a signed number could potentially be valid, resulting in match rejection due to the presence of the letter in the first group. To prevent this, two regular expressions will be _or_ed together to form two alternatives: first without a sign included:
([a-zA-Z]?)([0-9]+(\.[0-9]*)?)
Followed by the signed original expression (sign being mandatory in this case).
([+-][0-9]+(\.[0-9]*)?)
Therefore, if group 1 contains anything, the expression is rejected as not being a valid number. Group 2 indicates an *unsigned floating point or integer number*, while group 4 represents a *signed floating point or integer number*. The final regexp is:
([a-zA-Z]?)([0-9]+(\.[0-9]*)?)|([+-][0-9]+(\.[0-9]*)?)
Refer to demo3.