🤖 AI Summary
This paper addresses the multi-label instruction classification task for “How To” search queries. We propose InstructNet, the first framework to deeply adapt XLNet for this task. Leveraging wikiHow, we construct a high-quality multi-label dataset comprising 11,121 samples annotated with fine-grained operational intents. Methodologically, InstructNet integrates pretrained Transformer encoders (XLNet and BERT) with a multi-label classification head, and introduces a joint optimization objective combining macro-F1 and accuracy to enhance long-sequence semantic modeling and cross-label dependency capture. Experimental results demonstrate that the XLNet-based variant achieves 97.30% accuracy and 93.0% macro-F1—substantially outperforming existing approaches. These results validate InstructNet’s effectiveness and state-of-the-art performance in instruction understanding and multi-intent recognition.
📝 Abstract
People use search engines for various topics and items, from daily essentials to more aspirational and specialized objects. Therefore, search engines have taken over as peoples preferred resource. The How To prefix has become familiar and widely used in various search styles to find solutions to particular problems. This search allows people to find sequential instructions by providing detailed guidelines to accomplish specific tasks. Categorizing instructional text is also essential for task-oriented learning and creating knowledge bases. This study uses the How To articles to determine the multi-label instruction category. We have brought this work with a dataset comprising 11,121 observations from wikiHow, where each record has multiple categories. To find out the multi-label category meticulously, we employ some transformer-based deep neural architectures, such as Generalized Autoregressive Pretraining for Language Understanding (XLNet), Bidirectional Encoder Representation from Transformers (BERT), etc. In our multi-label instruction classification process, we have reckoned our proposed architectures using accuracy and macro f1-score as the performance metrics. This thorough evaluation showed us much about our strategys strengths and drawbacks. Specifically, our implementation of the XLNet architecture has demonstrated unprecedented performance, achieving an accuracy of 97.30% and micro and macro average scores of 89.02% and 93%, a noteworthy accomplishment in multi-label classification. This high level of accuracy and macro average score is a testament to the effectiveness of the XLNet architecture in our proposed InstructNet approach. By employing a multi-level strategy in our evaluation process, we have gained a more comprehensive knowledge of the effectiveness of our proposed architectures and identified areas for forthcoming improvement and refinement.