Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting a hobby project - Question #1234

Closed
levirtevs opened this issue Dec 13, 2022 · 2 comments
Closed

Starting a hobby project - Question #1234

levirtevs opened this issue Dec 13, 2022 · 2 comments

Comments

@levirtevs
Copy link

first question - I saw this mentioned in the docs. How to set it up?
nlp

second question - Idea of a system I thought and the last point in above picture are same. I think this system scales to even use nlp as a code parser. I want to build a code refactor tool using it.

@aigloss
Copy link
Collaborator

aigloss commented Dec 15, 2022

Hi @levirtevs ,

to use your custom language you should implement a Tokenizer, a Normalizer, a Stemmer, and a StopWords handler if needed. You can check any of the language packages, such as https://github.com/axa-group/nlp.js/tree/master/packages/lang-en, for examples.

@aigloss aigloss closed this as not planned Won't fix, can't repro, duplicate, stale Dec 15, 2022
@jesus-seijas-sp
Copy link
Contributor

Only one comment...
Fantasy languages are supported, you don't need to implement Tokenizer, a Normalizer, or Stemmer.
There are unit tests showing how it's able to understand Klingon:

test('Should work with fantasy languages', async () => {

And how the trigrams for language guessing are automatically added, so the fantasy language is able to be recognized:

test('Should even guess the fantasy language', async () => {

In fact, no need to be a fantasy language, can be a real language where we do not have implemented Tokenizer, a Normalizer, or Stemmer. Example: Vietnamese. Vietnamese is not officially supported by NLP.js, but when testing the accuracy of Amazon Massive, you get an outstanding accuracy with NLP.js, near the accuracy of huge models like XLM-R or mT5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants