PHP

Word Count (UTF-8 aware)

admin by @admin ADMIN
Jun 17, 2026
May 31, 2026
Public
0 0 up · 0 down Sign in to vote
Count words in a string in a way that works for any Unicode script — including those without ASCII whitespace. Built on the \p{L}\p{N} Unicode regex categories.
PHP
Raw
<?php
function wordCountU(string $text): int {
    preg_match_all('/[\p{L}\p{N}]+/u', $text, $matches);
    return count($matches[0]);
}

echo wordCountU("Hello world");                // 2
echo wordCountU("café au lait, s'il vous plaît"); // 6
echo wordCountU("日本語のテスト 123");           // 3 (depending on tokenizer)
Tags

Save your own code snippets

Create a free account and build your private vault. Share publicly whenever you want.