README.md 1.62 KB
Newer Older
Blanca Tebar's avatar
Blanca Tebar committed
1
2
3
4
5
6
7
8
9
10
11
12
This repository contains all code files utilised during the study titled: `Early Detection of Eating Disorders using Social Media.`

We have removed all datasets used during this research as they are not publicly available.

The structure of the files is as follows:

1. **Initial approach.** Pipelines and GridSearch for parameter optimisation can be found in `baseline_selection`. In `data-handlers/parse_data.py` the parsing of the CLEF dataset can be found.
2. **Feature extraction.** LDA, LIWC and the analysis of the writing features can be found in `feature_extraction`.
3. **Simple multimodal approaches.** In chapter 5 of the project, we described how we used both the extracted features and the posts' embeddings to train several models. These can be found in `multimodal_approaches/MULTIMODAL_APPROACH.ipynb`
4. **Multimodal approaches with personal information.** In chapter 6, we described how enhancing personal information can help to better identify those users suffering from EDs. For that, we used DPP-EXPEI (`multimodal_approaches/DPP-EXPEI` and `multimodal_approaches/DPPEXPEI-Multimodal.ipynb`), an author profiling technique, and different weighting schemes (`multimodal_approaches/PI-multimodal.ipynb`).
5. **Real-world evaluation**. Chapter 7 describes how we performed a practical evaluation of the proposed approaches using a web application which was used by a group of volunteers (spanish evaluation) and by ourselves (english evaluation). Its implementation can be found in `evaluation`.
Additionally, `data-handlers/collector.py`and `data-handlers/spanish-data` contain all code needed to analyse spanish data similarly to english.