Quality management in data collection projects : 3 things to consider
The demand for data collection projects has a hug
gross in the past 10 years. The fundamental
work of these projects typically involve translating large
quantities of language data, which is then used to train Natural Language
Processing (NLP) or machine learning engines. Transee has
adapted our language quality assurance (LQA) strategies to meet this need, because data
collection projects also need to be managed according to
specific requirements. Following is the several tips we’ve collected for data
collection quality management that we think
are effective.
Translation quality management
There are many familiar aspects between most localization
experts and translation quality concepts. but the process of running a language
quality program can often be unclear to their clients. Using
an similar example of manufacturing industry, LQA
can be described as a sampling process, whose goal is to prevent defects from
reaching customers. A sample of a finished product is
tested, and non-conformances may be detected. This is a good
attempt to identify the causes of the non-conformances, if
any, and adoption of preventive measures. You will have
a idea of how LQA works while replacing a batch sample
of a product with a selection from a translated text. In
an LQA, a text sample is reviewed by an independent assessor who reports errors
(non-compliances) and assigns error categories and severities to them.
It can be concluded as an attempt
to implement quality in the “very” human
activity of translation. It proceeds on the assumption that
language quality is something that can be assessed and evaluated – that any two
(or more) professional linguists can agree whether a translation meets certain
criteria or not.
As in manufacturing industry,
quality assurance in translation is about tactically
managing risk. Language quality managers typically focus on:
Selecting appropriate sample size and ensuring consistent quality. A small sample may
not represent the general quality of the project. Because it is
easy to overlook more challenging parts of the documents while you
using a small sample. On the other hand, a large sample
will affect the deadline and project margin, and still may not provide a clear
picture of quality.
Reviewing the quality of the LQA. An
experienced LQA reviewer plays a important role in this part–it’s
important that the reviewer understands the subject, comprehend all
client instructions, works thoroughly, and keep a good
balance between harsh and lenient in their assessment.
The managed through quality management
strategies always can overcome the
internal risks of a project, including:
·
Sampling frequency
·
selecting representative samples. When
translating a larger document, the sample
should be selected from different parts of the file, and theses
samples should cover all the content types included in this project.
·
Careful selection, vetting and
training of LQA reviewers
·
LQA follow-up. Following up on
issues that arise in an LQA review will make that review far more effective
than an LQA with no follow-up.
Adapting LQA to data collection
projects
Data collection
projects pose a bunch of
challenges to translators. Clients always
want every program can benefit machine learning models. So generally
they give translator a complex instructions, and
the source sentences are often
incomplete, fragmented and usually contain a wide range of subject matter,
requiring the translators to perform extensive research to become familiar with
the context. Text selections are often presented out of context and can be
ungrammatical, colloquial, slangy or very technical with a specialized
vocabulary. Finally, most data collection projects are translation-only, it means
that language solution providers
can’t rely on the normal 3-step workflow of translation,
editing and proofreading by a second linguist.
The Transee advantage
As Transee has worked
with many data collection clients, we’ve
adopted the same LQA standards that our
clients use to evaluate deliveries from suppliers. These standards
are specific to data collection requirements and are different from the ones
applied to typical translation projects for reader consumption. For example,
they typically discourage reviewers from logging stylistic improvements, which
don’t add much benefit given the purpose and size of these projects. In some
cases, Transee has developed hybrid models that
draw on the client’s LQA standards but weigh
certain error categories or severities differently.
All attended LQA
reviewers should training sessions alongside the
translators to ensure that everyone is on the
same page about the parameters and objectives of each project. We’ve found that
this first initial step results in a translation and review team that is
aligned and thoroughly understands all client instructions, goals and the LQA
process.
In our experience, a very hands-on approach is
necessary in managing data collection LQAs. The quality team often needs
to give suggestions at different phases of
the LQA process, train the reviewers further on specific aspects of the
process, correct misperceptions and act as moderators when the translators and
reviewers have differing interpretations of the client instructions and
requirements. Transee has built up
in-house expertise database in this area,
and we’re familiar with the typical issues and questions that arise during
translation and LQA of data collection projects. With this database of
experience, we can guide the teams to settle issues in accordance
with the client’s expectations.
In the processing of sampling, we
distinguish between data collection projects that relate to training AI-powered
virtual assistants and projects aimed at training machine translation engines.
The data for virtual assistants can usually be sorted according to scenarios (smart home,
TV, music, driving directions, weather, planning, shopping lists,
and more) and we ensure the LQA samples cover as many
of these scenarios as possible. Preparing samples for virtual assistant data
can require a lot of up-front work, but we can be fairly confident that the
sample is representative—the non-LQAed content will consist of variations of
the same commands and responses. In projects related to training machine
translation engines, the language data is typically random, so the samples can
be randomly selected.
While processing the a huge of data, a large
number of linguists work on these projects. The variations in translators’ work
quality and the levels of difficulty in the content, however, are within
reasonable limits that satisfy the clients’ quality requirements. And we can
use the results of an LQA to provide extra training or clarity to a translator
with a higher error rate.
Being cost-effective and reliable in
detecting issues is the most important for a
successful quality management program in data
collection projects, especially to meet client instructions. Several factors
need to align for this to work. It’s a very hands-on process where all parties
involved—supply chain, quality, production, translators and LQA reviewers—must
be flexible and open to learning and improving.
评论
发表评论