2024-07-07 18:49:38 -07:00

274 lines
7.3 KiB
Markdown

# bcp-47
[![Build][build-badge]][build]
[![Coverage][coverage-badge]][coverage]
[![Downloads][downloads-badge]][downloads]
[![Size][size-badge]][size]
Parse and stringify [BCP 47][spec] language tags.
## Install
[npm][]:
```sh
npm install bcp-47
```
## Contents
* [Use](#use)
* [API](#api)
* [`bcp47.parse(tag[, options])`](#bcp47parsetag-options)
* [`bcp47.stringify(schema)`](#bcp47stringifyschema)
* [`schema`](#schema)
* [`function warning(reason, code, offset)`](#function-warningreason-code-offset)
* [Related](#related)
* [License](#license)
## Use
```js
var bcp47 = require('bcp-47')
var schema = bcp47.parse('hy-Latn-IT-arevela')
console.log(schema)
console.log(bcp47.stringify(schema))
```
Yields:
```js
{ language: 'hy',
extendedLanguageSubtags: [],
script: 'Latn',
region: 'IT',
variants: ['arevela'],
extensions: [],
privateuse: [],
irregular: null,
regular: null }
'hy-Latn-IT-arevela'
```
## API
### `bcp47.parse(tag[, options])`
Parse a BCP 47 tag into a language schema.
Note that the algorithm is case-insensitive.
###### `options.normalize`
Whether to normalize legacy tags when possible (`boolean`, default:
`true`). For example, `i-klingon` does not match the BCP 47 language
algorithm but is valid nonetheless. It is suggested to use `tlh`
instead (the ISO 639-3 code for Klingon). When `normalize` is true,
passing `i-klingon`, or other deprecated tags, is handled as if their
suggested valid tag was given.
###### `options.forgiving`
By default, when an error is encountered, an empty object is returned.
When in forgiving mode, all found values up to the point of the error
are included (`boolean`, default: `false`). So, for example, where by
default `en-GB-abcdefghi` an empty object is returned (as the language
variant is too long), in `forgiving` mode the `language` of `schema` is
populated with `en` and the `region` is populated with `GB`.
###### `options.warning`
When given, `warning` is invoked when an error is encountered
([`Function`][warning]).
###### Returns
[`Schema`][schema] — Parsed BCP 47 language tag.
###### Throws
When `tag` is `null` or `undefined`.
### `bcp47.stringify(schema)`
Compile a [`schema`][schema] into a BCP 47 language tag.
###### Returns
`string` — BCP 47 language tag.
### `schema`
A schema may have the following properties. A schema is deemed empty
when it has neither `language`, `irregular`, `regular`, nor `privateuse`
(where an empty `privateuse` array is handled as no `privateuse`, too).
###### `schema.language`
Two or three character [ISO 639][iso-639] language code, four character
reserved language code, or 5 to 8 (inclusive) characters registered
language subtag (`string`). For example, `en` (English) or `cmn`
(Mandarin Chinese).
###### `schema.extendedLanguageSubtags`
Selected three-character [ISO 639][iso-639] codes(`Array.<string>`),
such as `yue` in `zh-yue-HK` (Chinese, Cantonese, as used in Hong Kong
SAR).
###### `schema.script`
Four character [ISO 15924][iso-15924] script code (`string`), such as
`Latn` in `hy-Latn-IT-arevela` (Eastern Armenian written in Latin
script, as used in Italy).
###### `schema.region`
Two alphabetical character [ISO 3166-1][iso-3166-1] code, or three
digit [UN M.49][un-m49] code (`string`). For example, `CN` in
`cmn-Hans-CN` (Mandarin Chinese, Simplified script, as used in China).
###### `schema.variants`
5 to 8 (inclusive) character language variants (`Array.<string>`), such
as both `rozaj` and `biske` in `sl-rozaj-biske` (San Giorgio dialect
of Resian dialect of Slovenian).
###### `schema.extensions`
List of extensions (`Array.<Object>`), each an object containing a one
character `singleton`, and a list of `extensions` (`string`).
`singleton` cannot be `x` (case-insensitive), and `extensions` must be
between two and eight (inclusive) characters. For example, an extension
would be `u-co-phonebk` in `de-DE-u-co-phonebk` (German, as used in
Germany, using German phonebook sort order), where `u` is the `singleton`
and `co` and `phonebk` are its extensions.
###### `schema.privateuse`
List of private-use subtags (`Array.<string>`), where each subtag must
be between one and eight (inclusive) characters.
###### `schema.regular`
One of the `regular` tags (`string`): tags which are seen as something
different by the algorithm.
Valid values are:
* `art-lojban`
* `cel-gaulish`
* `no-bok`
* `no-nyn`
* `zh-guoyu`
* `zh-hakka`
* `zh-min`
* `zh-min-nan`
* `zh-xiang`
###### `schema.irregular`
One of the `irregular` tags (`string`): tags which are seen as
invalid by the algorithm).
Valid values are:
* `en-GB-oed`
* `i-ami`
* `i-bnn`
* `i-default`
* `i-enochian`
* `i-hak`
* `i-klingon`
* `i-lux`
* `i-mingo`
* `i-navajo`
* `i-pwn`
* `i-tao`
* `i-tay`
* `i-tsu`
* `sgn-BE-FR`
* `sgn-BE-NL`
* `sgn-CH-DE`
### `function warning(reason, code, offset)`
Invoked when an error occurs.
###### Parameters
* `reason` (`string`) — English reason for failure
* `code` (`number`) — Code for failure
* `offset` (`number`) — Index-based position of error
###### Warnings
| code | reason |
| :--- | :--------------------------------------------------------------------- |
| 1 | Too long variant, expected at most 8 characters |
| 2 | Too long extension, expected at most 8 characters |
| 3 | Too many extended language subtags, expected at most 3 subtags |
| 4 | Empty extension, extensions must have at least 2 characters of content |
| 5 | Too long private-use area, expected at most 8 characters |
| 6 | Found superfluous content after tag |
## Related
* [`bcp-47-match`](https://github.com/wooorm/bcp-47-match)
— Match BCP 47 language tags with language ranges per RFC 4647
* [`bcp-47-normalize`](https://github.com/wooorm/bcp-47-normalize)
— Normalize, canonicalize, and format BCP 47 tags
* [`iso-3166`](https://github.com/wooorm/iso-3166)
— ISO 3166 codes
* [`iso-639-2`](https://github.com/wooorm/iso-639-2)
— ISO 639-2 codes
* [`iso-639-3`](https://github.com/wooorm/iso-639-3)
— ISO 639-3 codes
* [`iso-15924`](https://github.com/wooorm/iso-15924)
— ISO 15924 codes
* [`un-m49`](https://github.com/wooorm/un-m49)
— UN M49 codes
## License
[MIT][license] © [Titus Wormer][author]
<!-- Definitions -->
[build-badge]: https://img.shields.io/travis/wooorm/bcp-47.svg
[build]: https://travis-ci.org/wooorm/bcp-47
[coverage-badge]: https://img.shields.io/codecov/c/github/wooorm/bcp-47.svg
[coverage]: https://codecov.io/github/wooorm/bcp-47
[downloads-badge]: https://img.shields.io/npm/dm/bcp-47.svg
[downloads]: https://www.npmjs.com/package/bcp-47
[size-badge]: https://img.shields.io/bundlephobia/minzip/bcp-47.svg
[size]: https://bundlephobia.com/result?p=bcp-47
[npm]: https://docs.npmjs.com/cli/install
[license]: license
[author]: https://wooorm.com
[spec]: https://tools.ietf.org/html/bcp47
[warning]: #function-warningreason-code-offset
[schema]: #schema
[iso-639]: https://en.wikipedia.org/wiki/ISO_639
[iso-15924]: https://en.wikipedia.org/wiki/ISO_15924
[iso-3166-1]: https://en.wikipedia.org/wiki/ISO_3166-1
[un-m49]: https://en.wikipedia.org/wiki/UN_M.49