Search

Showing top 19 results for "AI training data"

Advances in private training for production on-device language models

… While training on-device models directly from user data effectively improves the utility performance for applications such as NWP and smart text selection , protecting the privacy of user data for model training is important. …

Feb 21, 2024

Croissant: a metadata format for ML-ready datasets

… However, these formats were designed for data discovery rather than for the specific needs of ML data, such as the ability to extract and combine data from structured and unstructured sources, to include metadata that would enable responsible use of the data, or to describe ML usage characteristics… …

Mar 6, 2024

Intervening on early readouts for mitigating spurious features and simplicity bi

… We evaluated our approach on standard benchmark datasets known to contain spurious correlations Waterbirds , CelebA , CivilComments , MNLI . Each of these datasets contain groupings of data that share an attribute potentially correlated with the label in a spurious manner. …

Feb 2, 2024

VideoPrism: A foundational visual encoder for video understanding

… Pre-training data A powerful ViFM needs a very large collection of videos on which to train — similar to other foundation models FMs , such as those for large language models LLMs . Ideally, we would want the pre-training data to be a representative sample of all the videos in the world. …

Feb 22, 2024

Using AI to expand global access to reliable flood forecasts

… Training data are daily streamflow values from the Global Runoff Data Center over the time period 1980 - 2023. A single streamflow forecast model is trained using data from 5,680 diverse watershed streamflow gauges shown below to improve accuracy . …

Mar 20, 2024

AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks

… To fit a model like in the figure above, all it takes is the following 10 lines of code, using the scikit-learn –inspired estimator interface: import autobnn as ab model = ab.operators.Add bnns= ab.kernels.PeriodicBNN width=50 , ab.kernels.LinearBNN width=50 , ab.kernels.MaternBNN width=50 estimato… …

Mar 28, 2024

Cappy: Outperforming and boosting large multi-task language models with a small scorer

… This process involves creating a separate regression dataset specific to the downstream training data with the same data annotation process used to construct the pre-training data. …

Mar 14, 2024

Graph neural networks in TensorFlow

… Like most neural networks, a GNN is trained on a dataset of many labeled examples ~millions , but each training step consists only of a much smaller batch of training examples say, hundreds . …

Feb 6, 2024

A decoder-only foundation model for time-series forecasting

… This helps TimesFM understand the bigger picture and generalize better when provided with domain-specific contexts not seen during training. Zero-shot evaluation results We evaluate TimesFM zero-shot on data not seen during training using popular time-series benchmarks. …

May 8, 2024

Health-specific embedding tools for dermatology and pathology

… To evaluate this approach, we trained models on a downstream task using teledermatology data. Model training involved varying dataset sizes 12.5%, 25%, 50%, 100% to compare embedding-based linear classifiers against fine-tuning. …

Mar 8, 2024

Followed topics